You are on page 1of 144

Important Concepts in Informatica8.

6
Debugger . To debug a mapping, you configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on the Integration Service. When you run the Debugger, it pauses at breakpoints and you can view and edit transformation output data. You might want to run the Debugger in the following situations:

Before you run a session. After you save a mapping, you can run some initial tests with a debug session before you create and configure a session in the Workflow Manager. After you run a session. If a session fails or if you receive unexpected results in the target, you can run the Debugger against the session. You might also want to run the Debugger against a session if you want to debug the mapping using the configured session properties.

Debugger Session Types: You can select three different debugger session types when you configure the Debugger. The Debugger runs a workflow for each session type. You can choose from the following Debugger session types when you configure the Debugger:

Use an existing non-reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs the non-reusable session and the existing workflow. The Debugger does not suspend on error. Use an existing reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs a debug instance of the reusable session And creates and runs a debug workflow for the session. Create a debug session instance. You can configure source, target, and session configuration properties through the Debugger Wizard. When you run the Debugger, the Integration Service runs a debug instance of the debug workflow and creates and runs a debug workflow for the session.

Debug Process To debug a mapping, complete the following steps: 1. Create breakpoints. Create breakpoints in a mapping where you want the Integration Service to evaluate data and error conditions. 2. Configure the Debugger. Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the Integration Service uses when it runs the Debugger. When you create a debug session, you configure a subset of session properties within the Debugger Wizard, such as source and target location. You can also choose to load or discard target data. 3. Run the Debugger. Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer connects to the Integration Service. The Integration Service initializes the Debugger and runs the debugging session and workflow. The Integration Service reads the breakpoints and pauses the Debugger when the breakpoints evaluate to true. 4. Monitor the Debugger. While you run the Debugger, you can monitor the target data, transformation and mapplet output data, the debug log, and the session log. When you run the Debugger, the Designer displays the following windows:

Debug log. View messages from the Debugger. Target window. View target data. Instance window. View transformation data.

5. Modify data and breakpoints. When the Debugger pauses, you can modify data and see the effect on transformations, mapplets, and targets as the data moves through the pipeline. You can also modify breakpoint information. The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy breakpoint information and the Debugger configuration to another mapping. If you want to run the Debugger from another Power Center Client machine, you can copy the breakpoint information and the Debugger configuration to the other Power Center Client machine. Running the Debugger: When you complete the Debugger Wizard, the Integration Service starts the session and initializes the Debugger. After initialization, the Debugger moves in and out of running and paused states based on breakpoints and commands that you issue from the Mapping Designer. The Debugger can be in one of the following states:

Initializing. The Designer connects to the Integration Service. Running. The Integration Service processes the data. Paused. The Integration Service encounters a break and pauses the Debugger.

Note: To enable multiple users to debug the same mapping at the same time, each user must configure different port numbers in the Tools > Options > Debug tab. The Debugger does not use the high availability functionality.

Monitoring the Debugger : When you run the Debugger, you can monitor the following information:

Session status. Monitor the status of the session. Data movement. Monitor data as it moves through transformations. Breakpoints. Monitor data that meets breakpoint conditions. Target data. Monitor target data on a row-by-row basis.

The Mapping Designer displays windows and debug indicators that help you monitor the session:

Debug indicators. Debug indicators on transformations help you follow breakpoints and data flow. Instance window. When the Debugger pauses, you can view transformation data and row information in the Instance window. Target window. View target data for each target in the mapping. Output window. The Integration Service writes messages to the following tabs in the Output window: Debugger tab. The debug log displays in the Debugger tab. Session Log tab. The session log displays in the Session Log tab. Notifications tab. Displays messages from the Repository Service.

While you monitor the Debugger, you might want to change the transformation output data to see the effect on subsequent transformations or targets in the data flow. You might also want to edit or add more breakpoint information to monitor the session more closely. Restrictions You cannot change data for the following output ports:

Normalizer transformation. Generated Keys and Generated Column ID ports. Rank transformation. RANKINDEX port. Router transformation. All output ports. Sequence Generator transformation. CURRVAL and NEXTVAL ports. Lookup transformation. NewLookupRow port for a Lookup transformation configured to use a dynamic cache. Custom transformation. Ports in output groups other than the current output group. Java transformation. Ports in output groups other than the current output group.

Additionally, you cannot change data associated with the following:


Mapplets that are not selected for debugging Input or input/output ports Output ports when the Debugger pauses on an error breakpoint

MAPPING PARAMETERS & VARIABLES

Mapping parameters and variables represent values in mappings and mapplets. When we use a mapping parameter or variable in a mapping, first we declare the mapping parameter or variable for use in each mapplet or mapping. Then, we define a value for the mapping parameter or variable before we run the session. MAPPING PARAMETERS

A mapping parameter represents a constant value that we can define before running a session. A mapping parameter retains the same value throughout the entire session.

Example: When we want to extract records of a particular month during ETL process, we will create a Mapping Parameter of data type and use it in query to compare it with the timestamp field in SQL override.

After we create a parameter, it appears in the Expression Editor. We can then use the parameter in any expression in the mapplet or mapping. We can also use parameters in a source qualifier filter, user-defined join, or extract override, and in the Expression Editor of reusable transformations.

MAPPING VARIABLES

Unlike mapping parameters, mapping variables are values that can change between sessions. The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session. We can override a saved value with the parameter file. We can also clear all saved values for the session in the Workflow Manager.

We might use a mapping variable to perform an incremental read of the source. For example, we have a source table containing time stamped transactions and we want to evaluate the transactions on a daily basis. Instead of manually entering a session override to filter source data each time we run the session, we can create a mapping variable, $ $IncludeDateTime. In the source qualifier, create a filter to read only rows whose transaction date equals $ $IncludeDateTime, such as: TIMESTAMP = $$IncludeDateTime In the mapping, use a variable function to set the variable value to increment one day each time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next time it runs the session, it reads only rows from August 2, 2004. Used in following transformations:

Expression Filter Router Update Strategy

Initial and Default Value:

When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an initial value. When the Integration Service needs an initial value, and we did not declare an initial value for the parameter or variable, the Integration Service uses a default value based on the data type of the parameter or variable. Data ->Default Value Numeric ->0 String ->Empty String Date time ->1/1/1 Variable Values: Start value and current value of a mapping variable Start Value: The start value is the value of the variable at the start of the session. The Integration Service looks for the start value in the following order: 1. 2. 3. 4. Value in parameter file Value saved in the repository Initial value Default value

Current Value: The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository. Note: If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository. Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we need to configure the Data type and aggregation type for the variable. The IS uses the aggregate type of a Mapping variable to determine the final current value of the mapping variable. Aggregation types are:

Count: Integer and small integer data types are valid only. Max: All transformation data types except binary data type are valid. Min: All transformation data types except binary data type are valid.

Variable Functions Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline. SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Max. SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Min.

SetCountVariable: Increments the variable value by one. It adds one to the variable value when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation type set to Count. SetVariable: Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository. Creating Mapping Parameters and Variables 1. Open the folder where we want to create parameter or variable. 2. In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet > Parameters and Variables. 3. Click the add button. 4. Enter name. Do not remove $$ from name. 5. Select Type and Data type. Select Aggregation type for mapping variables. 6. Give Initial Value. Click ok. Example: Use of Mapping of Mapping Parameters and Variables

EMP will be source table. Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR. TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month) SET_VAR: We will be added one month to the HIREDATE of every employee. Create shortcuts as necessary.

Creating Mapping 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example 4. Drag EMP and target table. 5. Transformation -> Create -> Select Expression for list -> Create > Done. 6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression. 7. Create Parameter $$Bonus and Give initial value as 200. 8. Create variable $$var_max of MAX aggregation type and initial value 1500. 9. Create variable $$var_min of MIN aggregation type and initial value 1500. 10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is visible when datatype is INT or SMALLINT. 11. Create variable $$var_set of MAX aggregation type.

12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR, out_COUNT_VAR and out_SET_VAR. 13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL + COMM + $$Bonus 14. Open Expression editor for out_max_var. 15. Select the variable function SETMAXVARIABLE from left side pane. Select $$var_max from variable tab and SAL from ports tab as shown below.SETMAXVARIABLE($$var_max,SAL)

17. Open Expression editor for out_min_var and write the following expression: SETMINVARIABLE($$var_min,SAL). Validate the expression. 18. Open Expression editor for out_count_var and write the following expression: SETCOUNTVARIABLE($$var_count). Validate the expression. 19. Open Expression editor for out_set_var and write the following expression: SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate. 20. Click OK. Expression Transformation below:

21. Link all ports from expression to target and Validate Mapping and Save it. 22. See mapping picture on next page.

PARAMETER FILE

A parameter file is a list of parameters and associated values for a workflow, worklet, or session. Parameter files provide flexibility to change these variables each time we run a workflow or session. We can create multiple parameter files and change the file we use for a session or workflow. We can create a parameter file using a text editor such as WordPad or Notepad. Enter the parameter file name and directory in the workflow or session properties.

A parameter file contains the following types of parameters and variables:


Workflow variable: References values and records information in a workflow. Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a parent workflow, but we cannot use workflow variables from the parent workflow in a worklet. Session parameter: Defines a value that can change from session to session, such as a database connection or file name. Mapping parameter and Mapping variable

USING A PARAMETER FILE Parameter files contain several sections preceded by a heading. The heading identifies the Integration Service, Integration Service process, workflow, worklet, or session to which we want to assign parameters or variables.

Make session and workflow. Give connection information for source and target table. Run workflow and see result.

Sample Parameter File for Our example:

In the parameter file, folder and session names are case sensitive. Create a text file in notepad with name Para_File.txt [Practice.ST:s_m_MP_MV_Example] $$Bonus=1000 $$var_max=500 $$var_min=1200 $$var_count=0 CONFIGURING PARAMTER FILE We can specify the parameter file name and directory in the workflow or session properties. To enter a parameter file in the workflow properties: 1. Open a Workflow in the Workflow Manager. 2. Click Workflows > Edit. 3. Click the Properties tab. 4. Enter the parameter directory and name in the Parameter Filename field. 5. Click OK. To enter a parameter file in the session properties: 1. Open a session in the Workflow Manager. 2. Click the Properties tab and open the General Options settings. 3. Enter the parameter directory and name in the Parameter Filename field. 4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt 5. Click OK.

SCHEDULERS

We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow. The Integration Service runs a scheduled workflow as configured. By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change schedule settings, the Integration Service reschedules the workflow according to the new settings.

A scheduler is a repository object that contains a set of schedule settings. Scheduler can be non-reusable or reusable. The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow. If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows. If we delete a folder, the Integration Service removes workflows from the schedule. The Integration Service does not run the workflow if: The prior workflow run fails. We remove the workflow from the schedule The Integration Service is running in safe mode

Creating a Reusable Scheduler


For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder. Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow. When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, we must edit them and replace the missing scheduler.

Steps: 1. 2. 3. 4. 5. 6. Open the folder where we want to create the scheduler. In the Workflow Designer, click Workflows > Schedulers. Click Add to add a new scheduler. In the General tab, enter a name for the scheduler. Configure the scheduler settings in the Scheduler tab. Click Apply and OK.

Configuring Scheduler Settings Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule. There are 3 run options: 1. Run on Demand 2. Run Continuously 3. Run on Server initialization

1. Run on Demand: Integration Service runs the workflow when we start the workflow manually. 2. Run Continuously: Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of the workflow as soon as it finishes the previous run. 3. Run on Server initialization Integration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run of the workflow according to settings in Schedule Options. Schedule options for Run on Server initialization:

Run Once: To run the workflow just once. Run every: Run the workflow at regular intervals, as configured. Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog box.

Start options for Run on Server initialization: Start Date Start Time End options for Run on Server initialization:

End on: IS stops scheduling the workflow in the selected date. End After: IS stops scheduling the workflow after the set number of

workflow runs. Forever: IS schedules the workflow as long as the workflow does not fail.

Creating a Non-Reusable Scheduler 1. In the Workflow Designer, open the workflow. 2. Click Workflows > Edit. 3. In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable scheduler for the workflow. 4. Note: If we do not have a reusable scheduler in the folder, we must 5. create one before we choose Reusable. 6. Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler 7. If we select Reusable, choose a reusable scheduler from the Scheduler 8. Browser dialog box. 9. Click Ok. Points to Ponder :

To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unscheduled Workflow. To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

Performance Tuning Overview The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance

bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance. If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session. Because determining the best way to improve performance can be complex, change one variable at a time, and time the session both before and after the change. If session performance does not improve, you might want to return to the original configuration. Complete the following tasks to improve session performance: 1. 2. 3. 4. 5. 6. 7. 8. Optimize the target. Enables the Integration Service to write to the targets efficiently. Optimize the source. Enables the Integration Service to read source data efficiently. Optimize the mapping. Enables the Integration Service to transform and move data efficiently. Optimize the transformation. Enables the Integration Service to process transformations in a mapping efficiently. Optimize the session. Enables the Integration Service to run the session more quickly. Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal performance. Optimize the Power Center components. Enables the Integration Service and Repository Service to function optimally. Optimize the system. Enables Power Center service processes to run more quickly.

PUSH DOWN OPTIMISATION You can push transformation logic to the source or target database using pushdown optimization. When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations. The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database. Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization. The following figure shows a mapping containing transformation logic that can be pushed to the source database:

This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement: INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN '' ELSE 5419 END) + '_' + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN '' ELSE ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2 ITEMS The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and description values from the source table, create new item IDs, and insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store number 5419, an underscore, and the original ITEM ID to get the new item ID. Pushdown Optimization Types You can configure the following types of pushdown optimization:

Source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database. Target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database. Full pushdown optimization. The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both sourceside and target-side pushdown optimization.

Running Source-Side Pushdown Optimization Sessions When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database. The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations. Running Target-Side Pushdown Optimization Sessions When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the Target database. Running Full Pushdown Optimization Sessions To use full pushdown optimization, the source and target databases must be in the same relational database management system. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. It generates and executes SQL statements against the source or target based on the transformation logic it can push to the database.

When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:

A long transaction uses more database resources. A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock. A long transaction increases the likelihood of an unexpected event. To minimize database performance issues for long transactions, consider using source-side or target-side pushdown optimization.

Rules and Guidelines for Functions in Pushdown Optimization Use the following rules and guidelines when pushing functions to a database:

If you use ADD_TO_DATE in transformation logic to change days, hours, minutes, or seconds, you cannot push the function to a Teradata database. When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If the input date contains sub seconds, Oracle trims the date to the second. When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces. An IBM DB2 database and the Integration Service produce different results for STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to calculate STDDEV and VARIANCE. When you push SYSDATE or SYSTIMESTAMP to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service. If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the format for SYSTIMESTAMP, the database ignores the format and returns the complete time stamp. You can push SYSTIMESTAMP (SS) to a Netezza database, but not SYSTIMESTAMP (MS) or SYSTIMESTAMP (US). When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is different, the Integration Service does not push the function to Netezza.

PERFORMANCE TUNING OF LOOKUP TRANSFORMATIONS Lookup transformations are used to lookup a set of values in another table.Lookups slows down the performance. 1. To improve performance, cache the lookup tables. Informatica can cache all the lookup and reference tables; this makes operations run very fast. (Meaning of cache is given in point 2 of this section and the procedure for determining the optimum cache size is given at the end of this document.) 2. Even after caching, the performance can be further improved by minimizing the size of the lookup cache. Reduce the number of cached rows by using a sql override with a restriction.

Cache: Cache stores data in memory so that Informatica does not have to read the table each time it is referenced. This reduces the time taken by the process to a large extent. Cache is automatically generated by Informatica depending on the marked lookup ports or by a user defined sql query. Example for caching by a user defined query: Suppose we need to lookup records where employee_id=eno. employee_id is from the lookup table, EMPLOYEE_TABLE and eno is the input that comes from the from the source table, SUPPORT_TABLE. We put the following sql query override in Lookup Transform select employee_id from EMPLOYEE_TABLE If there are 50,000 employee_id, then size of the lookup cache will be 50,000. Instead of the above query, we put the following:select emp employee_id from EMPLOYEE_TABLE e, SUPPORT_TABLE s where e. employee_id=s.eno If there are 1000 eno, then the size of the lookup cache will be only 1000.But here the performance gain will happen only if the number of records in SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as less as possible. 3. In lookup tables, delete all unused columns and keep only the fields that are used in the mapping. 4. If possible, replace lookups by joiner transformation or single source qualifier.Joiner transformation takes more time than source qualifier transformation. 5. If lookup transformation specifies several conditions, then place conditions that use equality operator = first in the conditions that appear in the conditions tab. 6. In the sql override query of the lookup table, there will be an ORDER BY clause. Remove it if not needed or put fewer column names in the ORDER BY list. 7. Do not use caching in the following cases: -Source is small and lookup table is large. -If lookup is done on the primary key of the lookup table. 8. Cache the lookup table columns definitely in the following case: -If lookup table is small and source is large. 9. If lookup data is static, use persistent cache. Persistent caches help to save and reuse cache files. If several sessions in the same job use the same lookup table, then using persistent cache will help the sessions to reuse cache files. In case of static lookups, cache files will be built from memory cache instead of from the database, which will improve the performance. 10. If source is huge and lookup table is also huge, then also use persistent cache.

11. If target table is the lookup table, then use dynamic cache. The Informatica server updates the lookup cache as it passes rows to the target. 12. Use only the lookups you want in the mapping. Too many lookups inside a mapping will slow down the session. 13. If lookup table has a lot of data, then it will take too long to cache or fit in memory. So move those fields to source qualifier and then join with the main table. 14. If there are several lookups with the same data set, then share the caches. 15. If we are going to return only 1 row, then use unconnected lookup. 16. All data are read into cache in the order the fields are listed in lookup ports. If we have an index that is even partially in this order, the loading of these lookups can be speeded up. 17. If the table that we use for look up has an index (or if we have privilege to add index to the table in the database, do so), then the performance would increase both for cached and un cached lookups. Optimizing the Bottlenecks 1. If the source is a flat file, ensure that the flat file is local to the Informatica server. If source is a relational table, then try not to use synonyms or aliases. 2. If the source is a flat file, reduce the number of bytes (By default it is 1024 bytes per line) the Informatica reads per line. If we do this, we can decrease the Line Sequential Buffer Length setting of the session properties. 3. If possible, give a conditional query in the source qualifier so that the records are filtered off as soon as possible in the process. 4. In the source qualifier, if the query has ORDER BY or GROUP BY, then create an index on the source table and order by the index field of the source table. PERFORMANCE TUNING OF TARGETS If the target is a flat file, ensure that the flat file is local to the Informatica server. If target is a relational table, then try not to use synonyms or aliases. 1. Use bulk load whenever possible. 2. Increase the commit level. 3. Drop constraints and indexes of the table before loading. PERFORMANCE TUNING OF MAPPINGS Mapping helps to channel the flow of data from source to target with all the transformations in between. Mapping is the skeleton of Informatica loading process. 1. Avoid executing major sql queries from mapplets or mappings. 2. Use optimized queries when we are using them. 3. Reduce the number of transformations in the mapping. Active transformations like rank, joiner, filter, aggregator etc should be used as less as possible. 4. Remove all the unnecessary links between the transformations from mapping. 5. If a single mapping contains many targets, then dividing them into separate mappings can improve performance. 6. If we need to use a single source more than once in a mapping, then keep only one source and source qualifier in the mapping. Then create different data flows as required into different targets or same target.

7. If a session joins many source tables in one source qualifier, then an optimizing query will improve performance. 8. In the sql query that Informatica generates, ORDERBY will be present. Remove the ORDER BY clause if not needed or at least reduce the number of column names in that list. For better performance it is best to order by the index field of that table. 9. Combine the mappings that use same set of source data. 10. On a mapping, field with the same information should be given the same type and length throughout the mapping. Otherwise time will be spent on field conversions. 11. Instead of doing complex calculation in query, use an expression transformer and do the calculation in the mapping. 12. If data is passing through multiple staging areas, removing the staging area will increase performance. 13. Stored procedures reduce performance. Try to keep the stored procedures simple in the mappings. 14. Unnecessary data type conversions should be avoided since the data type conversions impact performance. 15. Transformation errors result in performance degradation. Try running the mapping after removing all transformations. If it is taking significantly less time than with the transformations, then we have to fine-tune the transformation. 16. Keep database interactions as less as possible. PERFORMANCE TUNING OF SESSIONS A session specifies the location from where the data is to be taken, where the transformations are done and where the data is to be loaded. It has various properties that help us to schedule and run the job in the way we want. 1. Partition the session: This creates many connections to the source and target, and loads data in parallel pipelines. Each pipeline will be independent of the other. But the performance of the session will not improve if the number of records is less. Also the performance will not improve if it does updates and deletes. So session partitioning should be used only if the volume of data is huge and the job is mainly insertion of data. 2. Run the sessions in parallel rather than serial to gain time, if they are independent of each other. 3. Drop constraints and indexes before we run session. Rebuild them after the session run completes. Dropping can be done in pre session script and Rebuilding in post session script. But if data is too much, dropping indexes and then rebuilding them etc. will be not possible. In such cases, stage all data, pre-create the index, use a transportable table space and then load into database. 4. Use bulk loading, external loading etc. Bulk loading can be used only if the table does not have an index. 5. In a session we have options to Treat rows as Data Driven, Insert, Update and Delete. If update strategies are used, then we have to keep it as Data Driven. But when the session does only insertion of rows into target table, it has to be kept as Insert to improve performance. 6. Increase the database commit level (The point at which the Informatica server is set to commit data to the target table. For e.g. commit level can be set at every every 50,000 records) 7. By avoiding built in functions as much as possible, we can improve the performance. E.g. For concatenation, the operator || is faster than the function CONCAT (). So use operators instead of functions, where possible. The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc. reduce the performance to a big extent in this order. Preference should be in the opposite order. 8. String functions like substring, ltrim, and rtrim reduce the performance. In the sources, use delimited strings in case the source flat files or use varchar data type. 9. Manipulating high precision data types will slow down Informatica server. So disable high precision. 10. Localize all source and target tables, stored procedures, views, sequences etc. Try not to connect across synonyms. Synonyms and aliases slow down the performance.

DATABASE OPTIMISATION To gain the best Informatica performance, the database tables, stored procedures and queries used in Informatica should be tuned well. 1. If the source and target are flat files, then they should be present in the system in which the Informatica server is present. 2. Increase the network packet size. 3. The performance of the Informatica server is related to network connections.Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections. 4. Optimize target databases. IDENTIFICATION OF BOTTLENECKS Performance of Informatica is dependant on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first. Bottleneck may be present in source, target, transformations, mapping, session,database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project. Identify bottleneck in Source If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping.In original case, without the test filter, total time taken is as follows:Total Time = time taken by (source + transformations + target load) Now because of filter, Total Time = time taken by source So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck. Identify bottleneck in Target If the target is a relational table, then substitute it with a flat file and run the session. If the time taken now is very much less than the time taken for the session to load to table, then the target table is the bottleneck. Identify bottleneck in Transformation Remove the transformation from the mapping and run it. Note the time taken.Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck. But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the working mode. So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter,then transformation is the bottleneck. Identify bottleneck in sessions

We can use the session log to identify whether the source, target or transformations are the performance bottleneck. Session logs contain thread summary records like the following:MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time =[11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage =[18.304876]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]. If busy percentage is 100, then that part is the bottleneck. Basically we have to rely on thread statistics to identify the cause of performance issues. Once the Collect Performance Data option (In session Properties tab) is enabled, all the performance related information would appear in the log created by the session.

WORKING WITH TASKS Part 1 17 The Workflow Manager contains many types of tasks to help you build workflows and worklets. We can create reusable tasks in the Task Developer. Types of tasks:

Task Type

Tool where task can be created Task Developer Workflow Designer Worklet Designer Workflow Designer Worklet Designer

Reusable or not Yes Yes Yes No No No No No No

Session Email Command Event-Raise Event-Wait Timer Decision Assignment Control

SESSION TASK

A session is a set of instructions that tells the Power Center Server how and when to move data from sources to targets. To run a session, we must first create a workflow to contain the Session task. We can run as many sessions in a workflow as we need. We can run the Session tasks sequentially or concurrently, depending on our needs. The Power Center Server creates several files and in-memory caches depending on the transformations and options used in the session.

EMAIL TASK

The Workflow Manager provides an Email task that allows us to send email during a workflow. Created by Administrator usually and we just drag and use it in our mapping.

Steps: 1. 2. 3. 4. 5. 6. In the Task Developer or Workflow Designer, choose Tasks-Create. Select an Email task and enter a name for the task. Click Create. Click Done. Double-click the Email task in the workspace. The Edit Tasks dialog box appears. Click the Properties tab. Enter the fully qualified email address of the mail recipient in the Email User Name field.

7. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank. 8. Click the Open button in the Email Text field to open the Email Editor. 9. Click OK twice to save your changes. Example: To send an email when a session completes: Steps: 1. 2. 3. 4. 5. 6. 7. 8.

Create a workflow wf_sample_email Drag any session task to workspace. Edit Session task and go to Components tab. See On Success Email Option there and configure it. In Type select reusable or Non-reusable. In Value, select the email task to be used. Click Apply -> Ok. Validate workflow and Repository -> Save We can also drag the email task and use as per need. We can set the option to send email on success or failure in components tab of a session task.

COMMAND TASK The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during the workflow. For example, we can specify shell commands in the Command task to delete reject files, copy a file, or archive target files. Ways of using command task: 1. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands. 2. Pre- and post-session shell command: We can call a Command task as the pre- or post-session shell command for a Session task. This is done in COMPONENTS TAB of a session. We can run it in Pre-Session Command or Post Session Success Command or Post Session Failure Command. Select the Value and Type option as we did in Email task. Example: to copy a file sample.txt from D drive to E. Command: COPY D:\sample.txt E:\ in windows Steps for creating command task: 1. 2. 3. 4. 5. 6. 7. 8. In the Task Developer or Workflow Designer, choose Tasks-Create. Select Command Task for the task type. Enter a name for the Command task. Click Create. Then click done. Double-click the Command task. Go to commands tab. In the Commands tab, click the Add button to add a command. In the Name field, enter a name for the new command. In the Command field, click the Edit button to open the Command Editor. Enter only one command in the Command Editor.

9. Click OK to close the Command Editor. 10. Repeat steps 5-9 to add more commands in the task. 11. Click OK. Steps to create the workflow using command task: 1. 2. 3. 4. 5. 6. 7. 8. Create a task using the above steps to copy a file in Task Developer. Open Workflow Designer. Workflow -> Create -> Give name and click ok. Start is displayed. Drag session say s_m_Filter_example and command task. Link Start to Session task and Session to Command Task. Double click link between Session and Command and give condition in editor as $S_M_FILTER_EXAMPLE.Status=SUCCEEDED Workflow-> Validate Repository > Save

WORKING WITH EVENT TASKS We can define events in the workflow to specify the sequence of task execution. Types of Events:

Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a specified file to arrive at a given location. User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create events and then raise them as per need.

Steps for creating User Defined Event: 1. 2. 3. 4. Open any workflow where we want to create an event. Click Workflow-> Edit -> Events tab. Click to Add button to add events and give the names as per need. Click Apply -> Ok. Validate the workflow and Save it.

Types of Events Tasks:

EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to raise a user defined event.

EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur before executing the next session in the workflow.

Example1: Use an event wait task and make sure that session s_filter_example runs when abc.txt file is present in D:\FILES folder. Steps for creating workflow: 1. 2. 3. 4. 5. 6. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok. Task -> Create -> Select Event Wait. Give name. Click create and done. Link Start to Event Wait task. Drag s_filter_example to workspace and link it to event wait task. Right click on event wait task and click EDIT -> EVENTS tab. Select Pre Defined option there. In the blank space, give directory and filename to watch. Example: D:\FILES\abc.tct 7. Workflow validate and Repository Save.

Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE Steps for creating workflow: 1. 2. 3. 4. 5. 6. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok. Workflow -> Edit -> Events Tab and add events EVENT1 there. Drag s_m_filter_example and link it to START task. Click Tasks -> Create -> Select EVENT RAISE from list. Give name ER_Example. Click Create and then done.Link ER_Example to s_m_filter_example. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and Select EVENT1 from the list displayed. Apply -> OK.

7. Click link between ER_Example and s_m_filter_example and give the condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED 8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create and then done. 9. Link EW_WAIT to START task. 10. Right click EW_WAIT -> EDIT-> EVENTS tab. 11. Select User Defined there. Select the Event1 by clicking Browse Events button. 12. Apply -> OK. 13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT. 14. Mapping -> Validate 15. Repository -> Save. 16. Run workflow and see.

Reactions: You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response.

WORKING WITH TASKS Part 2 TIMER TASK The Timer task allows us to specify the period of time to wait before the Power Center Server runs the next task in the workflow. The Timer task has two types of settings:

Absolute time: We specify the exact date and time or we can choose a user-defined workflow variable to specify the exact time. The next task in workflow will run as per the date and time specified. Relative time: We instruct the Power Center Server to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts.

Example: Run session s_m_filter_example relative to 1 min after the timer task. Steps for creating workflow: 1. 2. 3. 4. 5. 6. 7. 8. Workflow -> Create -> Give name wf_timer_task_example -> Click ok. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click Create and then done. Link TIMER_Example to START task. Right click TIMER_Example-> EDIT -> TIMER tab. Select Relative Time Option and Give 1 min and Select From start time of this task Option. Apply -> OK. Drag s_m_filter_example and link it to TIMER_Example. Workflow-> Validate and Repository -> Save.

DECISION TASK

The Decision task allows us to enter a condition that determines the execution of the workflow, similar to a link condition. The Decision task has a pre-defined variable called $Decision_task_name.condition that represents the result of the decision condition. The Power Center Server evaluates the condition in the Decision task and sets the pre-defined condition variable to True (1) or False (0).

We can specify one decision condition per Decision task.

Example: Command Task should run only if either s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run. Steps for creating workflow: 1. Workflow -> Create -> Give name wf_decision_task_example -> Click ok. 2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and link both of them to START task. 3. Click Tasks -> Create -> Select DECISION from list. Give name DECISION_Example. Click Create and then done. Link DECISION_Example to both s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE. 4. Right click DECISION_Example-> EDIT -> GENERAL tab. 5. Set Treat Input Links As to OR. Default is AND. Apply and click OK. 6. Now edit decision task again and go to PROPERTIES Tab. Open the Expression editor by clicking the VALUE section of Decision Name attribute and enter the following condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED 7. Validate the condition -> Click Apply -> OK. 8. Drag command task and S_m_sample_mapping_EMP task to workspace and link them to DECISION_Example task. 9. Double click link between S_m_sample_mapping_EMP & DECISION_Example & give the condition: $DECISION_Example.Condition = 0. Validate & click OK. 10. Double click link between Command task and DECISION_Example and give the condition: $DECISION_Example.Condition = 1. Validate and click OK. 11. Workflow Validate and repository Save. 12. Run workflow and see the result.

CONTROL TASK

We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition. A parent workflow or worklet is the workflow or worklet that contains the Control task. We give the condition to the link connected to Control Task.

Control Option Fail Me Fail Parent

Description Fails the control task. Marks the status of the WF or worklet that contains the Control task as failed.

Stop Parent Abort Parent Fail Top-Level WF Stop Top-Level WF Abort TopLevel WF

Stops the WF or worklet that contains the Control task. Aborts the WF or worklet that contains the Control task. Fails the workflow that is running. Stops the workflow that is running. Aborts the workflow that is running.

Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow. Steps for creating workflow: 1. 2. 3. 4. 5. 6. Workflow -> Create -> Give name wf_control_task_example -> Click ok. Drag any 3 sessions to workspace and link all of them to START task. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task. Click Create and then done. Link all sessions to the control task cntr_task. Double click link between cntr_task and any session say s_m_filter_example and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED. 7. Repeat above step for remaining 2 sessions also. 8. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to OR. Default is AND. 9. Go to PROPERTIES tab of cntr_task and select the value Fail top level 10. Workflow for Control Option. Click Apply and OK. 11. Workflow Validate and repository Save. 12. Run workflow and see the result.

ASSIGNMENT TASK

The Assignment task allows us to assign a value to a user-defined workflow variable. See Workflow variable topic to add user defined variables. To use an Assignment task in the workflow, first create and add the Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to userdefined variables. We cannot assign values to pre-defined workflow.

Steps to create Assignment Task: 1. Open any workflow where we want to use Assignment task. 2. Edit Workflow and add user defined variables. 3. Choose Tasks-Create. Select Assignment Task for the task type. 4. Enter a name for the Assignment task. Click Create. Then click Done. 5. Double-click the Assignment task to open the Edit Task dialog box. 6. On the Expressions tab, click Add to add an assignment. 7. Click the Open button in the User Defined Variables field. 8. Select the variable for which you want to assign a value. Click OK. 9. Click the Edit button in the Expression field to open the Expression Editor. 10. Enter the value or expression you want to assign. 11. Repeat steps 7-10 to add more variable assignments as necessary. 12. Click OK. SCD Type 1 17 Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule For example, you may have a dimension in your database that tracks the sales records of your company's salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to another. How do you record such a change in your sales dimension? You could sum or average the sales by salesperson, but if you use that to compare the performance of salesmen, that might give misleading information. If the salesperson that was transferred used to work in a hot market where sales were easy, and now works in a market where sales are infrequent, her totals will look much stronger than the other

salespeople in her new region, even if they are just as good. Or you could create a second salesperson record and treat the transferred person as a new sales person, but that creates problems also. Dealing with these issues involves SCD management methodologies: Type 1: The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past.) Here is an example of a database table that keeps supplier information: Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co CA

In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code). However, the joins will perform better on an integer than on a character string. Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply overwrite this record: Supplier_Key Supplier_Code Supplier_Name Supplier_State 123 ABC Acme Supply Co IL

The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to Type 1 SCDs is that they are very easy to maintain. Explanation with an Example: Source Table: (01-01-11) Target Table: (01-01-11) Emp no Ename 101 102 103 A B C Sal 1000 2000 3000

Emp no 101 102 103

Ename A B C

Sal 1000 2000 3000

The necessity of the lookup transformation is illustrated using the above source and target table. Source Table: (01-02-11) Target Table: (01-02-11) Emp no 101 102 103 104

Ename A B C D

Sal 1000 2500 3000 4000

Empno 101 102 103 104

Ename A B C D

Sal 1000 2500 3000 4000

In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is changed to the 2500 instead of 2000.

Step 1: Is to import Source Table and Target table.


Create a table by name emp_source with three columns as shown above in oracle. Import the source from the source analyzer. In the same way as above create two target tables with the names emp_target1, emp_target2. Go to the targets Menu and click on generate and execute to confirm the creation of the target tables. The snap shot of the connections using different kinds of transformations are shown below.

Step 2: Design the mapping and apply the necessary transformation.

Here in this transformation we are about to use four kinds of transformations namely Lookup transformation, Expression Transformation, Filter Transformation, Update Transformation. Necessity and the usage of all the transformations will be discussed in detail below.

Look up Transformation: The purpose of this transformation is to determine whether to insert, Delete, Update or reject the rows in to target table.

The first thing that we are goanna do is to create a look up transformation and connect the Empno from the source qualifier to the transformation. The snapshot of choosing the Target table is shown below.

What Lookup transformation does in our mapping is it looks in to the target table (emp_table) and compares it with the Source Qualifier and determines whether to insert, update, delete or reject rows. In the Ports tab we should add a new column and name it as empno1 and this is column for which we are gonna connect from the Source Qualifier. The Input Port for the first column should be unchked where as the other ports like Output and lookup box should be checked. For the newly created column only input and output boxes should be checked. In the Properties tab (i) Lookup table name ->Emp_Target.

(ii)Look up Policy on Multiple Mismatch -> use First Value. (iii) Connection Information ->Oracle.

In the Conditions tab (i) Click on Add a new condition

(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and Operator should =.

Expression Transformation: After we are done with the Lookup Transformation we are using an expression transformation to check whether we need to insert the records the same records or we need to update the records. The steps to create an Expression Transformation are shown below.

Drag all the columns from both the source and the look up transformation and drop them all on to the Expression transformation. Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert and update. Both these columns are gonna be our output data so we need to have check mark only in front of the Output check box. The Snap shot for the Edit transformation window is shown below.

The condition that we want to parse through our output data are listed below.

Input IsNull(EMPNO1) Output iif(Not isnull (EMPNO1) and Decode(SAL,SAL1,1,0)=0,1,0) .

We are all done here .Click on apply and then OK.

Filter Transformation: we are gonna have two filter transformations one to insert and other to update.

Connect the Insert column from the expression transformation to the insert column in the first filter transformation and in the same way we are gonna connect the update column in the expression transformation to the update column in the second filter. Later now connect the Empno, Ename, Sal from the expression transformation to both filter transformation. If there is no change in input data then filter transformation 1 forwards the complete input to update strategy transformation 1 and same output is gonna appear in the target table.

If there is any change in input data then filter transformation 2 forwards the complete input to the update strategy transformation 2 then it is gonna forward the updated input to the target table. Go to the Properties tab on the Edit transformation

(i) The value for the filter condition 1 is Insert. (ii) The value for the filter condition 1 is Update.

The Closer view of the filter Connection is shown below.

Update Strategy Transformation: Determines whether to insert, delete, update or reject the rows.

Drag the respective Empno, Ename and Sal from the filter transformations and drop them on the respective Update Strategy Transformation. Now go to the Properties tab and the value for the update strategy expression is 0 (on the 1st update transformation). Now go to the Properties tab and the value for the update strategy expression is 1 (on the 2nd update transformation). We are all set here finally connect the outputs of the update transformations to the target table.

Step 3: Create the task and Run the work flow.


Dont check the truncate table option. Change Bulk to the Normal. Run the work flow from task.

Step 4: Preview the Output in the target table.

Type 2

SCD 2 (Complete):

Let us drive the point home using a simple scenario. For eg., in the current month ie.,(01-01-2010) we are provided with an source table with the three columns and three rows in it like (EMpno,Ename,Sal). There is a new employee added and one change in the records in the month (01-02-2010). We are gonna use the SCD-2 style to extract and load the records in to target table. The thing to be noticed here is if there is any update in the salary of any employee then the history of that employee is displayed with the current date as the start date and the previous date as the end date. Source Table: (01-01-11)

Emp no Ename 101 102 103 A B C

Sal 1000 2000 3000

Target Table: (01-01-11) Skey Emp no Ename 100 200 300 101 102 103 A B C Sal S-date E-date Ver 1 1 1 Flag 1 1 1

1000 01-01-10 Null 2000 01-01-10 Null 3000 01-01-10 Null

Source Table: (01-02-11) Emp no Ename 101 102 103 104 A B C D Sal 1000 2500 3000 4000

Target Table: (01-02-11) Skey Emp no Ename 100 200 300 201 400 101 102 103 102 104 A B C B D Sal S-date E-date Null Null Null Ver 1 1 1 2 1 Flag 1 1 1 0 1

1000 01-02-10 2000 01-02-10 3000 01-02-10

2500 01-02-10 01-01-10 4000 01-02-10 Null

In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is changed to the 2500 instead of 2000. Step 1: Is to import Source Table and Target table.

Create a table by name emp_source with three columns as shown above in oracle. Import the source from the source analyzer.

Drag the Target table twice on to the mapping designer to facilitate insert or update process. Go to the targets Menu and click on generate and execute to confirm the creation of the target tables. The snap shot of the connections using different kinds of transformations are shown below.

In The Target Table we are goanna add five columns (Skey, Version, Flag, S_date ,E_Date).

Step 2: Design the mapping and apply the necessary transformation.

Here in this transformation we are about to use four kinds of transformations namely Lookup transformation (1), Expression Transformation (3), Filter Transformation (2), Sequence Generator. Necessity and the usage of all the transformations will be discussed in detail below.

Look up Transformation: The purpose of this transformation is to Lookup on the target table and to compare the same with the Source using the Lookup Condition.

The first thing that we are gonna do is to create a look up transformation and connect the Empno from the source qualifier to the transformation. The snapshot of choosing the Target table is shown below.

Drag the Empno column from the Source Qualifier to the Lookup Transformation. The Input Port for only the Empno1 should be checked. In the Properties tab (i) Lookup table name ->Emp_Target.

(ii)Look up Policy on Multiple Mismatch -> use Last Value. (iii) Connection Information ->Oracle.

In the Conditions tab (i) Click on Add a new condition

(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and Operator should =. Expression Transformation: After we are done with the Lookup Transformation we are using an expression transformation to find whether the data on the source table matches with the target table. We specify the condition here whether to insert or to update the table. The steps to create an Expression Transformation are shown below.

Drag all the columns from both the source and the look up transformation and drop them all on to the Expression transformation. Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert and update. Both these columns are goanna be our output data so we need to have unchecked input check box. The Snap shot for the Edit transformation window is shown below.

The condition that we want to parse through our output data are listed below.

Insert : IsNull(EmpNO1) Update: iif(Not isnull (Skey) and Decode(SAL,SAL1,1,0)=0,1,0) .

We are all done here .Click on apply and then OK.

Filter Transformation: We need two filter transformations the purpose the first filter is to filter out the records which we are goanna insert and the next is vice versa.

If there is no change in input data then filter transformation 1 forwards the complete input to Exp 1 and same output is goanna appear in the target table. If there is any change in input data then filter transformation 2 forwards the complete input to the Exp 2 then it is gonna forward the updated input to the target table. Go to the Properties tab on the Edit transformation

(i) The value for the filter condition 1 is Insert. (ii) The value for the filter condition 2 is Update.

The closer view of the connections from the expression to the filter is shown below.

Sequence Generator: We use this to generate an incremental cycle of sequential range of number.The purpose of this in our mapping is to increment the skey in the bandwidth of 100.

We are gonna have a sequence generator and the purpose of the sequence generator is to increment the values of the skey in the multiples of 100 (bandwidth of 100). Connect the output of the sequence transformation to the Exp 1.

Expression Transformation:

Exp 1: It updates the target table with the skey values. Point to be noticed here is skey gets multiplied by 100 and a new row is generated if there is any new EMP added to the list. Else the there is no modification done on the target table.

Drag all the columns from the filter 1 to the Exp 1. Now add a new column as N_skey and the expression for it is gonna be Nextval1*100. We are goanna make the s-date as the o/p and the expression for it is sysdate. Flag is also made as output and expression parsed through it is 1. Version is also made as output and expression parsed through it is 1.

Exp 2: If same employee is found with any updates in his records then Skey gets added by 1 and version changes to the next higher number,F

Drag all the columns from the filter 2 to the Exp 2. Now add a new column as N_skey and the expression for it is gonna be Skey+1. Both the S_date and E_date is gonna be sysdate.

Exp 3: If any record of in the source table gets updated then we make it only as the output.

If change is found then we are gonna update the E_Date to S_Date.

Update Strategy: This is place from where the update instruction is set on the target table.

The update strategy expression is set to 1.

Step 3: Create the task and Run the work flow.


Dont check the truncate table option. Change Bulk to the Normal. Run the work flow from task. Create the task and run the work flow.

Step 4: Preview the Output in the target table.

26 SCD Type 3: This Method has limited history preservation, and we are goanna use skey as the Primary key here. Source table: (01-01-2011) Empno 101 102 103 Ename A B C Sal 1000 2000 3000

Target Table: (01-01-2011) Empno Ename 101 102 103 A B C C-sal 1000 2000 3000 P-sal -

Source Table: (01-02-2011) Empno Ename Sal

101 102 103

A B C

1000 4566 3000

Target Table (01-02-2011): Empno 101 102 103 102 Ename A B C B C-sal 1000 4566 3000 4544 P-sal Null 4566

So hope u got what Im trying to do with the above tables: Step 1: Initially in the mapping designer Im goanna create a mapping as below. And in this mapping Im using lookup, expression, filter, update strategy to drive the purpose. Explanation of each and every Transformation is given below.

Step 2: here we are goanna see the purpose and usage of all the transformations that we have used in the above mapping. Look up Transformation: The look Transformation looks the target table and compares the same with the source table. Based on the Look up condition it decides whether we need to update, insert, and delete the data from being loaded in to the target table.

As usually we are goanna connect Empno column from the Source Qualifier and connect it to look up transformation. Prior to this Look up transformation has to look at the target table. Next to this we are goanna specify the look up condition empno =empno1. Finally specify that connection Information (Oracle) and look up policy on multiple mismatches (use last value) in the Properties tab.

Expression Transformation: We are using the Expression Transformation to separate out the Insert-stuffs and Update- Stuffs logically.

Drag all the ports from the Source Qualifier and Look up in to Expression. Add two Ports and Rename them as Insert, Update.

These two ports are goanna be just output ports. Specify the below conditions in the Expression editor for the ports respectively.

Insert: isnull(ENO1 ) Update: iif(not isnull(ENO1) and decode(SAL,Curr_Sal,1,0)=0,1,0)

Filter Transformation: We are goanna use two filter Transformation to filter out the data physically in to two separate sections one for insert and the other for the update process to happen. Filter 1:

Drag the Insert and other three ports which came from source qualifier in to the Expression in to first filter. In the Properties tab specify the Filter condition as Insert.

Filter 2:

Drag the update and other four ports which came from Look up in to the Expression in to Second filter. In the Properties tab specify the Filter condition as update.

Update Strategy: Finally we need the update strategy to insert or to update in to the target table. Update Strategy 1: This is intended to insert in to the target table.

Drag all the ports except the insert from the first filter in to this.

In the Properties tab specify the condition as the 0 or dd_insert.

Update Strategy 2: This is intended to update in to the target table.


Drag all the ports except the update from the second filter in to this. In the Properties tab specify the condition as the 1 or dd_update.

Finally connect both the update strategy in to two instances of the target. Step 3: Create a session for this mapping and Run the work flow. Step 4: Observe the output it would same as the second target table

Incremental Aggregation: When we enable the session option-> Incremental Aggregation the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally. When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes incrementally and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session. For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation. When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and updates the target accordingly.Consider using incremental aggregation in the following circumstances:

You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process new data. Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data.

Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches. Integration Service Processing for Incremental Aggregation (i)The first time you run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties. (ii)Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data. (iii)When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.

(iv) If the source changes significantly and you want the Integration Service to continue saving aggregate data for future incremental changes, configure the Integration Service to overwrite existing aggregate data with new aggregate data. Each subsequent time you run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files. (v)When you partition a session that uses incremental aggregation, the Integration Service creates one set of cache files for each partition. The Integration Service creates new aggregate data, instead of using historical data, when you perform one of the following tasks:

Save a new version of the mapping. Configure the session to reinitialize the aggregate cache. Move the aggregate files without correcting the configured path or directory for the files in the session properties. Change the configured path or directory for the aggregate files without moving the files to the new location. Delete cache files. Decrease the number of partitions.

When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost. Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files. Preparing for Incremental Aggregation: When you use incremental aggregation, you need to configure both mapping and session properties:

Implement mapping logic or filter to remove pre-existing data. Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files.

Configuring the Mapping Before enabling incremental aggregation, you must capture changes in source data. You can use a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data during a session. Configuring the Session Use the following guidelines when you configure the session for incremental aggregation: (i) Verify the location where you want to store the aggregate files.

The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to store historical data for the session. When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter session-specific directories for the index and data files. However, by using the process variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir.

Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache and gather new aggregate data. In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service rebuilds incremental aggregation files, it loses aggregate history.

(ii) Verify the incremental aggregation settings in the session properties.


You can configure the session for incremental aggregation in the Performance settings on the Properties tab. You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this option after running the session.

Workflow Variables

You can create and use variables in a workflow to reference values and record information. For example, use a Variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task. If not, you can stop the workflow. Use the following types of workflow variables:

Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within a workflow. User-defined workflow variables. You create user-defined workflow variables when you create a workflow. Use workflow variables when you configure the following types of tasks: Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For Example, you can increment a user-defined counter variable by setting the variable to its current value plus 1. Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status variable to run a second session only if the first session completes successfully. Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false. Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task.

Use the following keywords to write expressions for user-defined and predefined workflow variables:

AND OR NOT TRUE FALSE NULL SYSDATE

Predefined Workflow Variables: Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions. Use the following types of predefined variables:

Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor. Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder name, Integration Service Name, system date, or workflow start time. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor. Description Evaluation result of decision condition expression. If the task fails, the Workflow Task Data type Types Decision Integer

Task-Specific Variables Condition

Manager keeps the condition set to null. Sample syntax: $Dec_TaskStatus.Condition = <TRUE | FALSE | NULL | any integer> End Time Date and time the associated task All tasks Date/Time ended. Precision is to the second. Sample syntax: $s_item_summary.EndTime > TO_DATE('11/10/2004 08:13:25') Last error code for the associated All tasks task. If there is no error, the Integration Service sets ErrorCode to 0 when the task completes. Sample syntax: $s_item_summary.ErrorCode = 24013. Note: You might use this variable when a task consistently fails with this final error message. Last error message for the All tasks associated task.If there is no error, the Integration Service sets ErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.ErrorMsg = 'PETL_24013 Session run completed with failure Variables of type Nstring can have a maximum length of 600 characters. Note: You might use this variable when a task consistently fails with this final error message. Error code for the first error Session message in the session. If there is no error, the Integration Integer

ErrorCode

ErrorMsg

Nstring

First Error Code

Integer

Service sets FirstErrorCode to 0 when the session completes. Sample syntax: $s_item_summary.FirstErrorCode = 7086 FirstErrorMsg First error message in the Session session.If there is no error, the Integration Service sets FirstErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.FirstErrorMsg = 'TE_7086 Tscrubber: Debug info Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters. Nstring

PrevTaskStatus Status of the previous task in the All workflow that the Integration Tasks Service ran. Statuses include: 1.ABORTED 2.FAILED 3.STOPPED 4.SUCCEEDED Use these key words when writing expressions to evaluate the status of the previous task. Sample syntax: $Dec_TaskStatus.PrevTaskStatus = FAILED SrcFailedRows Total number of rows the Session Integration Service failed to read from the source. Sample syntax: $s_dist_loc.SrcFailedRows = 0 Total number of rows successfully read from the sources. Sample syntax: $s_dist_loc.SrcSuccessRows > 2500 Session

Integer

Integer

SrcSuccessRows

Integer

StartTime

Date and time the associated task All Task Date/Time started. Precision is to the second. Sample syntax: $s_item_summary.StartTime > TO_DATE('11/10/2004 08:13:25')

Status

Status of the previous task in the All Task workflow. Statuses include: - ABORTED - DISABLED - FAILED - NOTSTARTED - STARTED - STOPPED - SUCCEEDED Use these key words when writing expressions to evaluate the status of the current task. Sample syntax: $s_dist_loc.Status = SUCCEEDED Total number of rows the Session Integration Service failed to write to the target. Sample syntax: $s_dist_loc.TgtFailedRows = 0 Total number of rows Session successfully written to the target. Sample syntax: $s_dist_loc.TgtSuccessRows > 0

Integer

TgtFailedRows

Integer

TgtSuccessRows

Integer

TotalTransErrors Total number of transformation Session errors. Sample syntax: $s_dist_loc.TotalTransErrors = 5 User-Defined Workflow Variables:

Integer

You can create variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. Use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables. Use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to

headquarters periodically, every tenth time you update the local orders database. Create separate sessions to update the local database and the one at headquarters.

Use a user-defined variable to determine when to run the session that updates the orders database at headquarters. To configure user-defined workflow variables, complete the following steps: 1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run. 2. Add a Start task and both sessions to the workflow. 3. Place a Decision task after the session that updates the local orders database.Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this. 4. Create an Assignment task to increment the $$WorkflowCount variable by one. 5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false. When you configure workflow variables using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs. Creating User-Defined Workflow Variables : You can create workflow variables for a workflow in the workflow properties. To create a workflow variable: 1. In the Workflow Designer, create a new workflow or edit an existing one. 2. Select the Variables tab. 3. Click Add. 4. Enter the information in the following table and click OK: Field Name Description Variable name. The correct format is $ $VariableName. Workflow variable names are not case sensitive. Do not use a single dollar sign ($) for a user-defined workflow variable. The single dollar sign is reserved for predefined workflow variables Data type of the variable. You can

Data type

select from the following data types: - Date/Time - Double - Integer - Nstring Persistent Whether the variable is persistent. Enable this option if you want the value of the variable retained from one execution of the workflow to the next. Default value of the variable. The Integration Service uses this value for the variable during sessions if you do not set a value for the variable in the parameter file and there is no value stored in the repository. Variables of type Date/Time can have the following formats: - MM/DD/RR - MM/DD/YYYY - MM/DD/RR HH24:MI - MM/DD/YYYY HH24:MI - MM/DD/RR HH24:MI:SS - MM/DD/YYYY HH24:MI:SS - MM/DD/RR HH24:MI:SS.MS - MM/DD/YYYY HH24:MI:SS.MS - MM/DD/RR HH24:MI:SS.US - MM/DD/YYYY HH24:MI:SS.US - MM/DD/RR HH24:MI:SS.NS - MM/DD/YYYY HH24:MI:SS.NS You can use the following separators: dash (-), slash (/), backslash (\), colon (:), period (.), and space. The Integration Service ignores extra spaces. You cannot use one- or three-digit values for year or the HH12 format for hour. Variables of type Nstring can have a maximum length of 600 characters. Whether the default value of the variable is null. If the default value is

Default Value

Is Null

null, enable this option. Description Description associated with the variable.

5. To validate the default value of the new workflow variable, click the Validate button. 6. Click Apply to save the new workflow variable. Constraint-Based Loading: In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:

Active source. Related target tables must have the same active source. Key relationships. Target tables must have key relationships. Target connection groups. Targets must be in one target connection group. Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based loading.

Active Source: When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table. Key Relationships: When target tables have no key relationships, the Integration Service does not perform constraint-based loading. Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load. Target Connection Groups: The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, complete the following tasks:

Verify all targets are in the same target load order group and receive data from the same active source. Use the default partition properties and do not add partitions or partition points.

Define the same target type for all targets in the session properties. Define the same database connection name for all targets in the session properties. Choose normal mode for the target load type for all targets in the session properties.

Treat Rows as Insert: Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading. When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:

Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table. Perform inserts in one mapping and updates in another mapping.

Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier. Example The following mapping is configured to perform constraint-based loading:

In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key. Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order: 1. T_1 2. T_2 and T_3 (in no particular order) 3. T_4 The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets. If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:

T_5 T_6

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets. Enabling Constraint-Based Loading: When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable constraint-based loading: 1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property. 2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering. 3. Click OK.

Different Transformations

SOUCE QUALIFIER TRANSFORMATION : : Active and Connected Transformation.


The Source Qualifier transformation represents the rows that the Power Center Server reads when it runs a session. It is only transformation that is not reusable. Default transformation except in case of XML or COBOL files.

Tasks performed by Source Qualifier:


Join data originating from the same source database: We can join two or more tables with primary key-foreign key relationships by linking the sources to one Source Qualifier transformation. Filter rows when the Power Center Server reads source data: If we Include a filter condition, the Power Center Server adds a WHERE clause to the Default query. Specify an outer join rather than the default inner join: If we include a User-defined join, the Power Center Server replaces the join information Specified by the metadata in the SQL query. Specify sorted ports: If we specify a number for sorted ports, the Power Center Server adds an ORDER BY clause to the default SQL query. Select only distinct values from the source: If we choose Select Distinct,the Power Center Server adds a SELECT DISTINCT statement to the default SQL query. Create a custom query to issue a special SELECT statement for the Power Center Server to read source data: For example, you might use a Custom query to perform aggregate calculations. The entire above are possible in Properties Tab of Source Qualifier t/f.

SAMPLE MAPPING TO BE MADE:

Source will be EMP and DEPT tables. Create target table as showed in Picture above. Create shortcuts in your folder as needed.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give mapping name. Ex: m_SQ_example 4. Drag EMP, DEPT, Target. 5. Right Click SQ_EMP and Select Delete from the mapping. 6. Right Click SQ_DEPT and Select Delete from the mapping. 7. Click Transformation -> Create -> Select Source Qualifier from List -> Give Name -> Click Create 8. Select EMP and DEPT both. Click OK. 9. Link all as shown in above picture. 10. Edit SQ -> Properties Tab -> Open User defined Join -> Give Join condition EMP.DEPTNO=DEPT.DEPTNO. Click Apply -> OK 11. Mapping -> Validate 12. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table. Make sure to give connection information for all tables.

SQ PROPERTIES TAB

1) SOURCE FILTER: We can enter a source filter to reduce the number of rows the Power Center Server queries. Note: When we enter a source filter in the session properties, we override the customized SQL query in the Source Qualifier transformation. Steps: 1. 2. 3. 4. 5. In the Mapping Designer, open a Source Qualifier transformation. Select the Properties tab. Click the Open button in the Source Filter field. In the SQL Editor Dialog box, enter the filter. Example: EMP.SAL)2000 Click OK.

Validate the mapping. Save it. Now refresh session and save the changes. Now run the workflow and see output. 2) NUMBER OF SORTED PORTS: When we use sorted ports, the Power Center Server adds the ports to the ORDER BY clause in the default query. By default it is 0. If we change it to 1, then the data will be sorted by column that is at the top in SQ. Example: DEPTNO in above figure.

If we want to sort as per ENAME, move ENAME to top. If we change it to 2, then data will be sorted by top two columns.

Steps: 1. 2. 3. 4. In the Mapping Designer, open a Source Qualifier transformation. Select the Properties tab. Enter any number instead of zero for Number of Sorted ports. Click Apply -> Click OK.

Validate the mapping. Save it. Now refresh session and save the changes. Now run the workflow and see output. 3) SELECT DISTINCT: If we want the Power Center Server to select unique values from a source, we can use the Select Distinct option.

Just check the option in Properties tab to enable it.

4) PRE and POST SQL Commands


The Power Center Server runs pre-session SQL commands against the source database before it reads the source. It runs post-session SQL commands against the source database after it writes to the target. Use a semi-colon (;) to separate multiple statements.

5) USER DEFINED JOINS Entering a user-defined join is similar to entering a custom SQL query. However, we only enter the contents of the WHERE clause, not the entire query.

We can specify equi join, left outer join and right outer join only. We Cannot specify full outer join. To use full outer join, we need to write SQL Query.

Steps: 1. 2. 3. 4. Open the Source Qualifier transformation, and click the Properties tab. Click the Open button in the User Defined Join field. The SQL Editor Dialog Box appears. Enter the syntax for the join. Click OK -> Again Ok.

Validate the mapping. Save it. Now refresh session and save the changes. Now run the workflow and see output. Join Type Equi Join Left Outer Join Syntax DEPT.DEPTNO=EMP.DEPTNO {EMP LEFT OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO} {EMP RIGHT OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO}

Right Outer Join

6) SQL QUERY For relational sources, the Power Center Server generates a query for each Source Qualifier transformation when it runs a session. The default query is a SELECT statement for each source column used in the mapping. In other words, the Power Center Server reads only the columns that are connected to another Transformation. In mapping above, we are passing only SAL and DEPTNO from SQ_EMP to Aggregator transformation. Default query generated will be:

SELECT EMP.SAL, EMP.DEPTNO FROM EMP

Viewing the Default Query 1. 2. 3. 4. 5. Open the Source Qualifier transformation, and click the Properties tab. Open SQL Query. The SQL Editor displays. Click Generate SQL. The SQL Editor displays the default query the Power Center Server uses to Select source data. Click Cancel to exit.

Note: If we do not cancel the SQL query, the Power Center Server overrides the default query with the custom SQL query.

We can enter an SQL statement supported by our source database. Before entering the query, connect all the input and output ports we want to use in the mapping. Example: As in our case, we cant use full outer join in user defined join, we can write SQL query for FULL OUTER JOIN: SELECT DEPT.DEPTNO, DEPT.DNAME, DEPT.LOC, EMP.EMPNO, EMP.ENAME, EMP.JOB, EMP.SAL, EMP.COMM, EMP.DEPTNO FROM EMP FULL OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO WHERE SAL>2000

We also added WHERE clause. We can enter more conditions and write More complex SQL.

We can write any query. We can join as many tables in one query as Required if all are in same database. It is very handy and used in most of the projects. Important Points:

When creating a custom SQL query, the SELECT statement must list the port names in the order in which they appear in the transformation.

Example: DEPTNO is top column; DNAME is second in our SQ mapping. So when we write SQL Query, SELECT statement have name DNAME first, DNAME second and so on. SELECT DEPT.DEPTNO, DEPT.DNAME

Once we have written a custom query like above, then this query will Always be used to fetch data from database. In our example, we used WHERE SAL>2000. Now if we use Source Filter and give condition SAL) 1000 or any other, then it will not work. Informatica will always use the custom query only. Make sure to test the query in database first before using it in SQL Query. If query is not running in database, then it wont work in Informatica too. Also always connect to the database and validate the SQL in SQL query editor.

FILTER TRANSFORMATION

Active and connected transformation.

We can filter rows in a mapping with the Filter transformation. We pass all the rows from a source transformation through the Filter transformation, and then enter a Filter condition for the transformation. All ports in a Filter transformation are input/output and only rows that meet the condition pass through the Filter Transformation.

Example: to filter records where SAL>2000


Import the source table EMP in Shared folder. If it is already there, then dont Import. In shared folder, create the target table Filter_Example. Keep all fields as in EMP table. Create the necessary shortcuts in the folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping -> Create -> Give mapping name. Ex: m_filter_example 4. Drag EMP from source in mapping. 5. Click Transformation -> Create -> Select Filter from list. Give name and click Create. Now click done. 6. Pass ports from SQ_EMP to Filter Transformation. 7. Edit Filter Transformation. Go to Properties Tab 8. Click the Value section of the Filter condition, and then click the Open button. 9. The Expression Editor appears. 10. Enter the filter condition you want to apply. 11. Click Validate to check the syntax of the conditions you entered. 12. Click OK -> Click Apply -> Click Ok. 13. Now connect the ports from Filter to target table. 14. Click Mapping -> Validate 15. Repository -> Save

Create Session and Workflow as described earlier. Run the workflow and see the data in target table. How to filter out rows with null values? To filter out rows containing null values or spaces, use the ISNULL and IS_SPACES Functions to test the value of the port. For example, if we want to filter out rows that Contain NULLs in the FIRST_NAME port, use the following condition: IIF (ISNULL (FIRST_NAME), FALSE, TRUE) This condition states that if the FIRST_NAME port is NULL, the return value is FALSE and the row should be discarded. Otherwise, the row passes through to the next Transformation. Performance tuning: Filter transformation is used to filter off unwanted fields based on conditions we Specify. 1. Use filter transformation as close to source as possible so that unwanted data gets Eliminated sooner. 2. If elimination of unwanted data can be done by source qualifier instead of filter,Then eliminate them at Source Qualifier itself. 3. Use conditional filters and keep the filter condition simple, involving TRUE/FALSE or 1/0

EXPRESSION TRANSFORMATION

Passive and connected transformation.

Use the Expression transformation to calculate values in a single row before we write to the target. For example, we might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. Use the Expression transformation to perform any non-aggregate calculations. Example: Addition, Subtraction, Multiplication, Division, Concat, Uppercase conversion, lowercase conversion etc. We can also use the Expression transformation to test conditional statements before we output the results to target tables or other transformations. Example: IF, Then, Decode There are 3 types of ports in Expression Transformation:

Input Output Variable: Used to store any temporary calculation.

Calculating Values : To use the Expression transformation to calculate values for a single row, we must include the following ports:

Input or input/output ports for each value used in the calculation: For example: To calculate Total Salary, we need salary and commission. Output port for the expression: We enter one expression for each output port. The return value for the output port needs to match the return value of the expression.

We can enter multiple expressions in a single Expression transformation. We can create any number of output ports in the transformation. Example: Calculating Total Salary of an Employee

Import the source table EMP in Shared folder. If it is already there, then dont import. In shared folder, create the target table Emp_Total_SAL. Keep all ports as in EMP table except Sal and Comm in target table. Add Total_SAL port to store the calculation. Create the necessary shortcuts in the folder.

Creating Mapping: 1. 2. 3. 4. 5. 6. 7. Open folder where we want to create the mapping. Click Tools -> Mapping Designer. Click Mapping -> Create -> Give mapping name. Ex: m_totalsal Drag EMP from source in mapping. Click Transformation -> Create -> Select Expression from list. Give name and click Create. Now click done. Link ports from SQ_EMP to Expression Transformation. Edit Expression Transformation. As we do not want Sal and Comm in target, remove check from output port for both columns. 8. Now create a new port out_Total_SAL. Make it as output port only. 9. Click the small button that appears in the Expression section of the dialog box and enter the expression in the Expression Editor. 10. Enter expression SAL + COMM. You can select SAL and COMM from Ports tab in expression editor. 11. Check the expression syntax by clicking Validate. 12. Click OK -> Click Apply -> Click Ok. 13. Now connect the ports from Expression to target table. 14. Click Mapping -> Validate 15. Repository -> Save Create Session and Workflow as described earlier. Run the workflow and see the data in target table.

As COMM is null, Total_SAL will be null in most cases. Now open your mapping and expression transformation. Select COMM port, In Default Value give 0. Now apply changes. Validate Mapping and Save. Refresh the session and validate workflow again. Run the workflow and see the result again. Now use ERROR in Default value of COMM to skip rows where COMM is null. Syntax: ERROR(Any message here) Similarly, we can use ABORT function to abort the session if COMM is null. Syntax: ABORT(Any message here) Make sure to double click the session after doing any changes in mapping. It will prompt that mapping has changed. Click OK to refresh the mapping. Run workflow after validating and saving the workflow. Performance tuning : Expression transformation is used to perform simple calculations and also to do Source lookups. 1. Use operators instead of functions. 2. Minimize the usage of string functions. 3. If we use a complex expression multiple times in the expression transformer, then Make that expression as a variable. Then we need to use only this variable for all computations.

Reactions:

ROUTER TRANSFORMATION Active and connected transformation.

A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the Condition. However, a Router transformation tests data for one or more conditions And gives you the option to route rows of data that do not meet any of the conditions to a default output group. Example: If we want to keep employees of France, India, US in 3 different tables, then we can use 3 Filter transformations or 1 Router transformation.

Mapping A uses three Filter transformations while Mapping B produces the same result with one Router transformation. A Router transformation consists of input and output groups, input and output ports, group filter conditions, and properties that we configure in the Designer.

Working with Groups A Router transformation has the following types of groups:

Input: The Group that gets the input ports. Output: User Defined Groups and Default Group. We cannot modify or delete Output ports or their properties.

User-Defined Groups: We create a user-defined group to test a condition based on incoming data. A user-defined group consists of output ports and a group filter Condition. We can create and edit user-defined groups on the Groups tab with the Designer. Create one user-defined group for each condition that we want to specify.

The Default Group: The Designer creates the default group after we create one new user-defined group. The Designer does not allow us to edit or delete the default group. This group does not have a group filter condition associated with it. If all of the conditions evaluate to FALSE, the IS passes the row to the default group. Example: Filtering employees of Department 10 to EMP_10, Department 20 to EMP_20 and rest to EMP_REST

Source is EMP Table. Create 3 target tables EMP_10, EMP_20 and EMP_REST in shared folder. Structure should be same as EMP table. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give mapping name. Ex: m_router_example 4. Drag EMP from source in mapping. 5. Click Transformation -> Create -> Select Router from list. Give name and Click Create. Now click done. 6. Pass ports from SQ_EMP to Router Transformation. 7. Edit Router Transformation. Go to Groups Tab 8. Click the Groups tab, and then click the Add button to create a user-defined Group. The default group is created automatically.. 9. Click the Group Filter Condition field to open the Expression Editor. 10. Enter a group filter condition. Ex: DEPTNO=10 11. Click Validate to check the syntax of the conditions you entered.

12. Create another group for EMP_20. Condition: DEPTNO=20 13. The rest of the records not matching the above two conditions will be passed to DEFAULT group. See sample mapping 14. Click OK -> Click Apply -> Click Ok. 15. Now connect the ports from router to target tables. 16. Click Mapping -> Validate 17. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table. Make sure to give connection information for all 3 target tables.

Sample Mapping:

Difference between Router and Filter : We cannot pass rejected data forward in filter but we can pass it in router. Rejected data is in Default Group of router.

SORTER TRANSFORMATION

Connected and Active Transformation The Sorter transformation allows us to sort data. We can sort data in ascending or descending order according to a specified sort key. We can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct.

When we create a Sorter transformation in a mapping, we specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order. We also configure sort criteria the Power Center Server applies to all sort key ports and the system resources it allocates to perform the sort operation. The Sorter transformation contains only input/output ports. All data passing through the Sorter transformation is sorted according to a sort key. The sort key is one or more ports that we want to use as the sort criteria. Sorter Transformation Properties 1. Sorter Cache Size: The Power Center Server uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. The Power Center Server passes all incoming data into the Sorter transformation Before it performs the sort operation.

We can specify any amount between 1 MB and 4 GB for the Sorter cache size. If it cannot allocate enough memory, the Power Center Server fails the Session. For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Power Center Server machine. Informatica recommends allocating at least 8 MB of physical memory to sort data using the Sorter transformation.

2. Case Sensitive: The Case Sensitive property determines whether the Power Center Server considers case when sorting data. When we enable the Case Sensitive property, the Power Center Server sorts uppercase characters higher than lowercase characters. 3. Work Directory Directory Power Center Server uses to create temporary files while it sorts data. 4. Distinct: Check this option if we want to remove duplicates. Sorter will sort data according to all the ports when it is selected.

Example: Sorting data of EMP by ENAME


Source is EMP table. Create a target table EMP_SORTER_EXAMPLE in target designer. Structure same as EMP table. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give mapping name. Ex: m_sorter_example 4. Drag EMP from source in mapping. 5. Click Transformation -> Create -> Select Sorter from list. Give name and click Create. Now click done. 6. Pass ports from SQ_EMP to Sorter Transformation. 7. Edit Sorter Transformation. Go to Ports Tab 8. Select ENAME as sort key. CHECK mark on KEY in front of ENAME. 9. Click Properties Tab and Select Properties as needed. 10. Click Apply -> Ok. 11. Drag target table now. 12. Connect the output ports from Sorter to target table. 13. Click Mapping -> Validate 14. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table. Make sure to give connection information for all tables.

Sample Sorter Mapping :

Performance Tuning: Sorter transformation is used to sort the input data. 1. While using the sorter transformation, configure sorter cache size to be larger than the input data size. 2. Configure the sorter cache size setting to be larger than the input data size while Using sorter transformation. 3. At the sorter transformation, use hash auto keys partitioning or hash user keys Partitioning.

RANK TRANSFORMATION

Active and connected transformation

The Rank transformation allows us to select only the top or bottom rank of data. It Allows us to select a group of top or bottom values, not just one value. During the session, the Power Center Server caches input data until it can perform The rank calculations. Rank Transformation Properties :

Cache Directory where cache will be made. Top/Bottom Rank as per need Number of Ranks Ex: 1, 2 or any number Case Sensitive Comparison can be checked if needed Rank Data Cache Size can be set Rank Index Cache Size can be set

Ports in a Rank Transformation : Ports Number Required Description

I O V

1 Minimum 1 Minimum not needed

Port to receive data from another transformation. Port we want to pass to other transformation. can use to store values or calculations to use in an expression. Rank port. Rank is calculated according to it. The Rank port is an input/output port. We must link the Rank port to another transformation. Example: Total Salary

Only 1

Rank Index The Designer automatically creates a RANKINDEX port for each Rank transformation. The Power Center Server uses the Rank Index port to store the ranking position for Each row in a group.

For example, if we create a Rank transformation that ranks the top five salaried employees, the rank index numbers the employees from 1 to 5.

The RANKINDEX is an output port only. We can pass the rank index to another transformation in the mapping or directly to a target. We cannot delete or edit it.

Defining Groups Rank transformation allows us to group information. For example: If we want to select the top 3 salaried employees of each Department, we can define a group for Department.

By defining groups, we create one set of ranked rows for each group. We define a group in Ports tab. Click the Group By for needed port. We cannot Group By on port which is also Rank Port.

1) Example: Finding Top 5 Salaried Employees


EMP will be source table. Create a target table EMP_RANK_EXAMPLE in target designer. Structure should be same as EMP table. Just add one more port Rank_Index to store RANK INDEX. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give mapping name. Ex: m_rank_example 4. Drag EMP from source in mapping. 5. Create an EXPRESSION transformation to calculate TOTAL_SAL. 6. Click Transformation -> Create -> Select RANK from list. Give name and click Create. Now click done. 7. Pass ports from Expression to Rank Transformation. 8. Edit Rank Transformation. Go to Ports Tab 9. Select TOTAL_SAL as rank port. Check R type in front of TOTAL_SAL. 10. Click Properties Tab and Select Properties as needed. 11. Top in Top/Bottom and Number of Ranks as 5. 12. Click Apply -> Ok. 13. Drag target table now. 14. Connect the output ports from Rank to target table. 15. Click Mapping -> Validate 16. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table. Make sure to give connection information for all tables.

2) Example: Finding Top 2 Salaried Employees for every DEPARTMENT


Open the mapping made above. Edit Rank Transformation. Go to Ports Tab. Select Group By for DEPTNO.

Go to Properties tab. Set Number of Ranks as 2. Click Apply -> Ok. Mapping -> Validate and Repository Save.

Refresh the session by double clicking. Save the changed and run workflow to see the new result.

RANK CACHE Sample Rank Mapping When the Power Center Server runs a session with a Rank transformation, it compares an input row with rows in the data cache. If the input row out-ranks a Stored row, the Power Center Server replaces the stored row with the input row. Example: Power Center caches the first 5 rows if we are finding top 5 salaried Employees. When 6th row is read, it compares it with 5 rows in cache and places it in Cache is needed. 1) RANK INDEX CACHE: The index cache holds group information from the group by ports. If we are Using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO

2) RANK DATA CACHE: It holds row data until the Power Center Server completes the ranking and is Generally larger than the index cache. To reduce the data cache size, connect Only the necessary input/output ports to subsequent transformations.

All Variable ports if there, Rank Port, All ports going out from RANK Transformations are stored in RANK DATA CACHE. Example: All ports except DEPTNO In our mapping example

Transaction Control transformation Power Center lets you control commit and roll back transactions based on a set of rows that pass through a Transaction Control transformation. A transaction is the set of rows bound by commit or roll back rows. You can define a

transaction based on a varying number of input rows. You might want to define transactions based on a group of rows ordered on a common key, such as employee ID or order entry date. In Power Center, you define transaction control at the following levels:

Within a mapping. Within a mapping, you use the Transaction Control transformation to define a transaction. You define transactions using an expression in a Transaction Control transformation. Based on the return value of the expression, you can choose to commit, roll back, or continue without any transaction changes. Within a session. When you configure a session, you configure it for user-defined commit. You can choose to commit or roll back a transaction if the Integration Service fails to transform or write any row to the target.

When you run the session, the Integration Service evaluates the expression for each row that enters the transformation. When it evaluates a commit row, it commits all rows in the transaction to the target or targets. When the Integration Service evaluates a roll back row, it rolls back all rows in the transaction from the target or targets. If the mapping has a flat file target you can generate an output file each time the Integration Service starts a new transaction. You can dynamically name each target flat file. Properties Tab On the Properties tab, you can configure the following properties:

Transaction control expression Tracing level

Enter the transaction control expression in the Transaction Control Condition field. The transaction control expression uses the IIF function to test each row against the condition. Use the following syntax for the expression: IIF (condition, value1, value2) The expression contains values that represent actions the Integration Service performs based on the return value of the condition. The Integration Service evaluates the condition on a row-by-row basis. The return value determines whether the Integration Service commits, rolls back, or makes no transaction changes to the row. When the Integration Service issues a commit or roll back based on the return value of the expression, it begins a new transaction. Use the following built-in variables in the Expression Editor when you create a transaction control expression:

TC_CONTINUE_TRANSACTION. The Integration Service does not perform any transaction change for this row. This is the default value of the expression. TC_COMMIT_BEFORE. The Integration Service commits the transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_COMMIT_AFTER. The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction. TC_ROLLBACK_BEFORE. The Integration Service rolls back the current transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. TC_ROLLBACK_AFTER. The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled back transaction.

If the transaction control expression evaluates to a value other than commit, roll back, or continue, the Integration Service fails the session.

Mapping Guidelines and Validation Use the following rules and guidelines when you create a mapping with a Transaction Control transformation:

If the mapping includes an XML target, and you choose to append or create a new document on commit, the input groups must receive data from the same transaction control point. Transaction Control transformations connected to any target other than relational, XML, or dynamic MQSeries targets are ineffective for those targets. You must connect each target instance to a Transaction Control transformation. You can connect multiple targets to a single Transaction Control transformation. You can connect only one effective Transaction Control transformation to a target. You cannot place a Transaction Control transformation in a pipeline branch that starts with a Sequence Generator transformation. If you use a dynamic Lookup transformation and a Transaction Control transformation in the same mapping, a rolled-back transaction might result in unsynchronized target data. A Transaction Control transformation may be effective for one target and ineffective for another target. If each target is connected to an effective Transaction Control transformation, the mapping is valid. Either all targets or none of the targets in the mapping should be connected to an effective Transaction Control transformation.

Example to Transaction Control: Step 1: Design the mapping.

Step 2: Creating a Transaction Control Transformation.

In the Mapping Designer, click Transformation > Create. Select the Transaction Control transformation. Enter a name for the transformation.[ The naming convention for Transaction Control transformations is TC_TransformationName]. Enter a description for the transformation. Click Create. Click Done. Drag the ports into the transformation. Open the Edit Transformations dialog box, and select the Ports tab.

Select the Properties tab. Enter the transaction control expression that defines the commit and roll back behavior.

Go to the Properties tab and click on the down arrow to get in to the expression editor window. Later go to the Variables tab and Type IIF(EMpno=7654,) select the below things from the built in functions. IIF (EMPNO=7654,TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION)

Connect all the columns from the transformation to the target table and save the mapping. Select the Metadata Extensions tab. Create or edit metadata extensions for the Transaction Control transformation. Click OK.

Step 3: Create the task and the work flow. Step 4: Preview the output in the target table.

STORED PROCEDURE T/F Passive Transformation Connected and Unconnected Transformation Stored procedures are stored and run within the database.

A Stored Procedure transformation is an important tool for populating and Maintaining databases. Database administrators create stored procedures to Automate tasks that are too complicated for standard SQL statements. Use of Stored Procedure in mapping:

Check the status of a target database before loading data into it. Determine if enough space exists in a database. Perform a specialized calculation. Drop and recreate indexes. Mostly used for this in projects.

Data Passes Between IS and Stored Procedure One of the most useful features of stored procedures is the ability to send data to the stored procedure, and receive data from the stored procedure. There are three types of data that pass between the Integration Service and the stored procedure: Input/output parameters: Parameters we give as input and the parameters returned from Stored Procedure. Return values: Value returned by Stored Procedure if any. Status codes: Status codes provide error handling for the IS during a workflow. The stored procedure issues a status code that notifies whether or not the stored procedure completed successfully. We cannot see this value. The IS uses it to determine whether to continue running the session or stop. Specifying when the Stored Procedure Runs Normal: The stored procedure runs where the transformation exists in the mapping on a row-by-row basis. We pass some input to procedure and it returns some calculated values. Connected stored procedures run only in normal mode. Pre-load of the Source: Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table. Post-load of the Source: After the session retrieves data from the source, the stored procedure runs. This is useful for removing temporary tables. Pre-load of the Target: Before the session sends data to the target, the stored procedure runs. This is useful for dropping indexes or disabling constraints. Post-load of the Target: After the session sends data to the target, the stored procedure runs. This is useful for recreating indexes on the database.

Using a Stored Procedure in a Mapping : 1. 2. 3. 4. 5. Create the stored procedure in the database. Import or create the Stored Procedure transformation. Determine whether to use the transformation as connected or unconnected. If connected, map the appropriate input and output ports. If unconnected, either configure the stored procedure to run pre- or post-session, or configure it to run from an expression in another transformation. 6. Configure the session. Stored Procedures: Connect to Source database and create the stored procedures given below: CREATE OR REPLACE procedure sp_agg (in_deptno in number, max_sal out number, min_sal out number, avg_sal out number, sum_sal out number) As Begin select max(Sal),min(sal),avg(sal),sum(sal) into max_sal,min_sal,avg_sal,sum_sal from emp where deptno=in_deptno group by deptno; End; / CREATE OR REPLACE procedure sp_unconn_1_value(in_deptno in number, max_sal out number) As Begin Select max(Sal) into max_sal from EMP where deptno=in_deptno; End; / 1. Connected Stored Procedure T/F Example: To give input as DEPTNO from DEPT table and find the MAX, MIN, AVG and SUM of SAL from EMP table.

DEPT will be source table. Create a target table SP_CONN_EXAMPLE with fields DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL & SUM_SAL. Write Stored Procedure in Database first and Create shortcuts as needed.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give name. Ex: m_SP_CONN_EXAMPLE 4. Drag DEPT and Target table. 5. Transformation -> Import Stored Procedure -> Give Database Connection -> Connect -> Select the procedure sp_agg from the list.

6. Drag DEPTNO from SQ_DEPT to the stored procedure input port and also to DEPTNO port of target. 7. Connect the ports from procedure to target as shown below: 8. Mapping -> Validate 9. Repository -> Save

Create Session and then workflow. Give connection information for all tables. Give connection information for Stored Procedure also. Run workflow and see the result in table.

2. Unconnected Stored Procedure T/F : An unconnected Stored Procedure transformation is not directly connected to the flow of data through the mapping. Instead, the stored procedure runs either:

From an expression: Called from an expression transformation. Pre- or post-session: Runs before or after a session.

Method of returning the value of output parameters to a port:


Assign the output value to a local variable. Assign the output value to the system variable PROC_RESULT. (See Later)

Example 1: DEPTNO as input and get MAX of Sal as output.


DEPT will be source table. Create a target table with fields DEPTNO and MAX_SAL of decimal data type. Write Stored Procedure in Database first and Create shortcuts as needed.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give name. Ex: m_sp_unconn_1_value 4. Drag DEPT and Target table. 5. Transformation -> Import Stored Procedure -> Give Database Connection -> Connect -> Select the procedure sp_unconn_1_value from the list. Click OK. 6. Stored Procedure has been imported. 7. T/F -> Create Expression T/F. Pass DEPTNO from SQ_DEPT to Expression T/F. 8. Edit expression and create an output port OUT_MAX_SAL of decimal data type. 9. Open Expression editor and call the stored procedure as below:Click OK and connect the port from expression to target as in mapping below:

10. Mapping -> Validate 11. Repository Save.


Create Session and then workflow. Give connection information for all tables. Give connection information for Stored Procedure also. Run workflow and see the result in table.

PROC_RESULT use:

If the stored procedure returns a single output parameter or a return value, we the reserved variable PROC_RESULT as the output variable.

Example: DEPTNO as Input and MAX Sal as output : :SP.SP_UNCONN_1_VALUE(DEPTNO,PROC_RESULT)

If the stored procedure returns multiple output parameters, you must create variables for each output parameter.

Example: DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL as output then: 1. Create four variable ports in expression VAR_MAX_SAL,VAR_MIN_SAL, VAR_AVG_SAL and iVAR_SUM_SAL.

2. Create four output ports in expression OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL and OUT_SUM_SAL. 3. Call the procedure in last variable port says VAR_SUM_SAL. :SP.SP_AGG (DEPTNO, VAR_MAX_SAL,VAR_MIN_SAL, VAR_AVG_SAL, PROC_RESULT) Example 2: DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL as O/P Stored Procedure to drop index in Pre Load of Target Stored Procedure to create index in Post Load of Target

DEPT will be source table. Create a target table SP_UNCONN_EXAMPLE with fields DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL & SUM_SAL. Write Stored Procedure in Database first and Create shortcuts as needed. Stored procedures are given below to drop and create index on target.Make sure to create target table first. Stored Procedures to be created in next example in Target Database:

Create or replace procedure CREATE_INDEX As Begin

Execute immediate 'create index unconn_dept on SP_UNCONN_EXAMPLE(DEPTNO)'; End; / Create or replace procedure DROP_INDEX As Begin Execute immediate 'drop index unconn_dept'; End; / Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give name. Ex: m_sp_unconn_1_value 4. Drag DEPT and Target table. 5. Transformation -> Import Stored Procedure -> Give Database Connection -> Connect -> Select the procedure sp_agg from the list. Click OK. 6. Stored Procedure has been imported. 7. T/F -> Create Expression T/F. Pass DEPTNO from SQ_DEPT to Expression T/F. 8. Edit Expression and create 4 variable ports and 4 output ports as shown below:

9. Call the procedure in last variable port VAR_SUM_SAL. 10. :SP.SP_AGG (DEPTNO, VAR_MAX_SAL, VAR_MIN_SAL, VAR_AVG_SAL, PROC_RESULT) 11. Click Apply and Ok. 12. Connect to target table as needed. 13. Transformation -> Import Stored Procedure -> Give Database Connection for target -> Connect -> Select the procedure CREATE_INDEX and DROP_INDEX from the list. Click OK. 14. Edit DROP_INDEX -> Properties Tab -> Select Target Pre Load as Stored Procedure Type and in call text write drop_index. Click Apply -> Ok. 15. Edit CREATE_INDEX -> Properties Tab -> Select Target Post Load as Stored Procedure Type and in call text write create_index. Click Apply -> Ok. 16. Mapping -> Validate 17. Repository -> Save

Create Session and then workflow.

Give connection information for all tables. Give connection information for Stored Procedures also. Also make sure that you execute the procedure CREATE_INDEX on database before using them in mapping. This is because, if there is no INDEX on target table, DROP_INDEX will fail and Session will also fail. Run workflow and see the result in table.

SQL Transformation: You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors. When you create an SQL transformation, you configure the following options: Mode:-The SQL transformation runs in one of the following modes:

Script mode. The SQL transformation runs ANSI SQL scripts that are externally located. You pass a script name to the transformation with each input row. The SQL transformation outputs one row for each input row. Query mode. The SQL transformation executes a query that you define in a query editor. You can pass strings or parameters to the query to define dynamic queries or change the selection parameters. You can output multiple rows when the query has a SELECT statement. Passive or active transformation. The SQL transformation is an active transformation by default. You can configure it as a passive transformation when you create the transformation. Database type. The type of database the SQL transformation connects to. Connection type. Pass database connection information to the SQL transformation or use a connection object.

Script Mode An SQL transformation running in script mode runs SQL scripts from text files. You pass each script file name from the source to the SQL transformation Script Name port. The script file name contains the complete path to the script file. When you configure the transformation to run in script mode, you create a passive transformation. The transformation returns one row for each input row. The output row contains results of the query and any database error. Rules and Guidelines for Script Mode Use the following rules and guidelines for an SQL transformation that runs in script mode:

You can use a static or dynamic database connection with script mode. To include multiple query statements in a script, you can separate them with a semicolon. You can use mapping variables or parameters in the script file name. The script code page defaults to the locale of the operating system. You can change the locale of the script.

The script file must be accessible by the Integration Service. The Integration Service must have read permissions on the directory that contains the script. The Integration Service ignores the output of any SELECT statement you include in the SQL script. The SQL transformation in script mode does not output more than one row of data for each input row. You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in the script. You cannot use nested scripts where the SQL script calls another SQL script. A script cannot accept run-time arguments.

Query Mode

When you configure the SQL transformation to run in query mode, you create an active transformation. When an SQL transformation runs in query mode, it executes an SQL query that you define in the transformation. You pass strings or parameters to the query from the transformation input ports to change the query statement or the query data.

You can create the following types of SQL queries in the SQL transformation:

Static SQL query. The query statement does not change, but you can use query parameters to change the data. The Integration Service prepares the query once and runs the query for all input rows. Dynamic SQL query. You can change the query statements and the data. The Integration Service prepares a query for each input row.

Rules and Guidelines for Query Mode Use the following rules and guidelines when you configure the SQL transformation to run in query mode:

The number and the order of the output ports must match the number and order of the fields in the query SELECT clause. The native data type of an output port in the transformation must match the data type of the corresponding column in the database. The Integration Service generates a row error when the data types do not match. When the SQL query contains an INSERT, UPDATE, or DELETE clause, the transformation returns data to the SQL Error port, the pass-through ports, and the Num Rows Affected port when it is enabled. If you add output ports the ports receive NULL data values. When the SQL query contains a SELECT statement and the transformation has a pass-through port, the transformation returns data to the pass-through port whether or not the query returns database data. The SQL transformation returns a row with NULL data in the output ports. You cannot add the "_output" suffix to output port names that you create. You cannot use the pass-through port to return data from a SELECT query. When the number of output ports is more than the number of columns in the SELECT clause, the extra ports receive a NULL value. When the number of output ports is less than the number of columns in the SELECT clause, the Integration Service generates a row error. You can use string substitution instead of parameter binding in a query. However, the input ports must be string data types.

SQL Transformation Properties

After you create the SQL transformation, you can define ports and set attributes in the following transformation tabs:

Ports. Displays the transformation ports and attributes that you create on the SQL Ports tab. Properties. SQL transformation general properties. SQL Settings. Attributes unique to the SQL transformation. SQL Ports. SQL transformation ports and attributes.

Note: You cannot update the columns on the Ports tab. When you define ports on the SQL Ports tab, they display on the Ports tab. Properties Tab Configure the SQL transformation general properties on the Properties tab. Some transformation properties do not apply to the SQL transformation or are not configurable. The following table describes the SQL transformation properties:

Property Run Time Location

Description Enter a path relative to the Integration Service node that runs the SQL transformation session. If this property is blank, the Integration Service uses the environment variable defined on the Integration Service node to locate the DLL or shared library. You must copy all DLLs or shared libraries to the runtime location or to the environment variable defined on the Integration Service node. The Integration Service fails to load the procedure when it cannot locate the DLL, shared library, or a referenced file.

Tracing Level

Sets the amount of detail included in the session log when you run a session containing this transformation. When you

configure the SQL transformation tracing level to Verbose Data, the Integration Service writes each SQL query it prepares to the session log. Is Partition able Multiple partitions in a pipeline can use this transformation. Use the following options: - No. The transformation cannot be partitioned. The transformation and other transformations in the same pipeline are limited to one partition. You might choose No if the transformation processes all the input data together, such as data cleansing. - Locally. The transformation can be partitioned, but the Integration Service must run all partitions in the pipeline on the same node. Choose Locally when different partitions of the transformation must share objects in memory. - Across Grid. The transformation can be partitioned, and the Integration Service can distribute each partition to different nodes. Default is No. Update Strategy The transformation defines the update strategy for output rows. You can enable this property for query mode SQL transformations.

Default is disabled. Transformation Scope The method in which the Integration Service applies the transformation logic to incoming data. Use the following options: - Row - Transaction - All Input Set transaction scope to transaction when you use transaction control in static query mode. Default is Row for script mode transformations.Default is All Input for query mode transformations. Output is Repeatable Indicates if the order of the output data is consistent between session runs. - Never. The order of the output data is inconsistent between session runs. - Based On Input Order. The output order is consistent between session runs when the input data order is consistent between session runs. - Always. The order of the output data is consistent between session runs even if the order of the input data is inconsistent between session runs. Default is Never.

Generate Transaction The transformation generates transaction rows. Enable this property for query mode SQL transformations that commit data in an SQL query. Default is disabled. Indicates if the Integration Service processes each Thread Per Partition partition of a procedure with one thread. Output is Deterministic The transformation generate consistent output data between session runs. Enable this property to perform recovery on sessions that use this transformation. Default is enabled. Requires Single

Create Mapping : Step 1: Creating a flat file and importing the source from the flat file.

Create a Notepad and in it create a table by name bikes with three columns and three records in it. Create one more notepad and name it as path for the bikes. Inside the Notepad just type in (C:\bikes.txt) and save it. Import the source (second notepad) using the source->import from the file. After which we are goanna get a wizard with three subsequent windows and follow the on screen instructions to complete the process of importing the source.

Step 2: Importing the target and applying the transformation. In the same way as specified above go to the targets->import from file and select an empty notepad under the name targetforbikes (this is one more blank notepad which we should create and save under the above specified name in the C :\).

Create two columns in the target table under the name report and error. We are all set here. Now apply the SQL transformation. In the first window when you apply the SQL transformation we should select the script mode. Connect the SQ to the ScriptName under inputs and connect the other two fields to the output correspondingly.

Snapshot for the above discussed things is given below.

Step 3: Design the work flow and run it.


Create the task and the work flow using the naming conventions. Go to the mappings tab and click on the Source on the left hand pane to specify the path for the output file.

Step 4: Preview the output data on the target table.

NORMALIZER TRANSFORMATION Active and Connected Transformation. The Normalizer transformation normalizes records from COBOL and relational sources, allowing us to organize the data. Use a Normalizer transformation instead of the Source Qualifier transformation when we normalize a COBOL source. We can also use the Normalizer transformation with relational sources to create multiple rows from a single row of data.

Example 1: To create 4 records of every employee in EMP table.


EMP will be source table. Create target table Normalizer_Multiple_Records. Structure same as EMP and datatype of HIREDATE as VARCHAR2. Create shortcuts as necessary.

Creating Mapping : 1. 2. 3. 4. 5. 6. 7. 8. 9. Open folder where we want to create the mapping. Click Tools -> Mapping Designer. Click Mapping-> Create-> Give name. Ex: m_ Normalizer_Multiple_Records Drag EMP and Target table. Transformation->Create->Select Expression-> Give name, Click create, done. Pass all ports from SQ_EMP to Expression transformation. Transformation-> Create-> Select Normalizer-> Give name, create & done. Try dragging ports from Expression to Normalizer. Not Possible. Edit Normalizer and Normalizer Tab. Add columns. Columns equal to columns in EMP table and datatype also same. 10. Normalizer doesnt have DATETIME datatype. So convert HIREDATE to char in expression t/f. Create output port out_hdate and do the conversion. 11. Connect ports from Expression to Normalizer. 12. Edit Normalizer and Normalizer Tab. As EMPNO identifies source records and we want 4 records of every employee, give OCCUR for EMPNO as 4.

13. 14. Click Apply and then OK. 15. Add link as shown in mapping below: 16. Mapping -> Validate 17. Repository -> Save

Make session and workflow. Give connection information for source and target table. Run workflow and see result.

Example 2: To break rows into columns Source: Roll_Number Name 100 101 102 Amit Rahul Jessie ENG HINDI MATHS 78 76 65 76 78 98 90 87 79

Target : Roll_Number Name Marks

100 100 100 101 101 101 102 102 102


Amit Amit Amit Rahul Rahul Rahul Jessie Jessie Jessie

78 76 90 76 78 87 65 98 79

Make source as a flat file. Import it and create target table. Create Mapping as before. In Normalizer tab, create only 3 ports Roll_Number, Name and Marks as there are 3 columns in target table. Also as we have 3 marks in source, give Occurs as 3 for Marks in Normalizer tab. Connect accordingly and connect to target. Validate and Save Make Session and workflow and Run it. Give Source File Directory and Source File name for source flat file in source properties in mapping tab of session. See the result.

RANK TRANSFORMATION

Active and connected transformation

The Rank transformation allows us to select only the top or bottom rank of data. It Allows us to select a group of top or bottom values, not just one value. During the session, the Power Center Server caches input data until it can perform The rank calculations. Rank Transformation Properties :

Cache Directory where cache will be made. Top/Bottom Rank as per need Number of Ranks Ex: 1, 2 or any number Case Sensitive Comparison can be checked if needed Rank Data Cache Size can be set Rank Index Cache Size can be set

Ports in a Rank Transformation : Ports Number Required Description

I O V

1 Minimum 1 Minimum not needed

Port to receive data from another transformation. Port we want to pass to other transformation. can use to store values or calculations to use in an expression. Rank port. Rank is calculated according to it. The Rank port is an input/output port. We must link the Rank port to another transformation. Example: Total Salary

Only 1

Rank Index The Designer automatically creates a RANKINDEX port for each Rank transformation. The Power Center Server uses the Rank Index port to store the ranking position for Each row in a group. For example, if we create a Rank transformation that ranks the top five salaried employees, the rank index numbers the employees from 1 to 5.

The RANKINDEX is an output port only. We can pass the rank index to another transformation in the mapping or directly to a target. We cannot delete or edit it.

Defining Groups Rank transformation allows us to group information. For example: If we want to select the top 3 salaried employees of each Department, we can define a group for Department.

By defining groups, we create one set of ranked rows for each group. We define a group in Ports tab. Click the Group By for needed port. We cannot Group By on port which is also Rank Port.

1) Example: Finding Top 5 Salaried Employees

EMP will be source table. Create a target table EMP_RANK_EXAMPLE in target designer. Structure should be same as EMP table. Just add one more port Rank_Index to store RANK INDEX. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give mapping name. Ex: m_rank_example 4. Drag EMP from source in mapping. 5. Create an EXPRESSION transformation to calculate TOTAL_SAL. 6. Click Transformation -> Create -> Select RANK from list. Give name and click Create. Now click done. 7. Pass ports from Expression to Rank Transformation. 8. Edit Rank Transformation. Go to Ports Tab 9. Select TOTAL_SAL as rank port. Check R type in front of TOTAL_SAL. 10. Click Properties Tab and Select Properties as needed. 11. Top in Top/Bottom and Number of Ranks as 5. 12. Click Apply -> Ok. 13. Drag target table now. 14. Connect the output ports from Rank to target table. 15. Click Mapping -> Validate 16. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table. Make sure to give connection information for all tables.

2) Example: Finding Top 2 Salaried Employees for every DEPARTMENT


Open the mapping made above. Edit Rank Transformation. Go to Ports Tab. Select Group By for DEPTNO. Go to Properties tab. Set Number of Ranks as 2. Click Apply -> Ok. Mapping -> Validate and Repository Save.

Refresh the session by double clicking. Save the changed and run workflow to see the new result.

RANK CACHE Sample Rank Mapping When the Power Center Server runs a session with a Rank transformation, it compares an input row with rows in the data cache. If the input row out-ranks a Stored row, the Power Center Server replaces the stored row with the input row.

Example: Power Center caches the first 5 rows if we are finding top 5 salaried Employees. When 6th row is read, it compares it with 5 rows in cache and places it in Cache is needed. 1) RANK INDEX CACHE: The index cache holds group information from the group by ports. If we are Using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO

2) RANK DATA CACHE: It holds row data until the Power Center Server completes the ranking and is Generally larger than the index cache. To reduce the data cache size, connect Only the necessary input/output ports to subsequent transformations.

All Variable ports if there, Rank Port, All ports going out from RANK Transformations are stored in RANK DATA CACHE. Example: All ports except DEPTNO In our mapping example.

ROUTER TRANSFORMATION Active and connected transformation.

A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the Condition. However, a Router transformation tests data for one or more conditions And gives you the option to route rows of data that do not meet any of the conditions to a default output group. Example: If we want to keep employees of France, India, US in 3 different tables, then we can use 3 Filter transformations or 1 Router transformation.

Mapping A uses three Filter transformations while Mapping B produces the same result with one Router transformation.

A Router transformation consists of input and output groups, input and output ports, group filter conditions, and properties that we configure in the Designer.

Working with Groups A Router transformation has the following types of groups:

Input: The Group that gets the input ports. Output: User Defined Groups and Default Group. We cannot modify or delete Output ports or their properties.

User-Defined Groups: We create a user-defined group to test a condition based on incoming data. A user-defined group consists of output ports and a group filter Condition. We can create and edit user-defined groups on the Groups tab with the Designer. Create one user-defined group for each condition that we want to specify. The Default Group: The Designer creates the default group after we create one new user-defined group. The Designer does not allow us to edit or delete the default group. This group does not have a group filter condition associated with it. If all of the conditions evaluate to FALSE, the IS passes the row to the default group. Example: Filtering employees of Department 10 to EMP_10, Department 20 to EMP_20 and rest to EMP_REST

Source is EMP Table. Create 3 target tables EMP_10, EMP_20 and EMP_REST in shared folder. Structure should be same as EMP table. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give mapping name. Ex: m_router_example 4. Drag EMP from source in mapping. 5. Click Transformation -> Create -> Select Router from list. Give name and Click Create. Now click done. 6. Pass ports from SQ_EMP to Router Transformation. 7. Edit Router Transformation. Go to Groups Tab 8. Click the Groups tab, and then click the Add button to create a user-defined Group. The default group is created automatically.. 9. Click the Group Filter Condition field to open the Expression Editor. 10. Enter a group filter condition. Ex: DEPTNO=10 11. Click Validate to check the syntax of the conditions you entered.

12. Create another group for EMP_20. Condition: DEPTNO=20 13. The rest of the records not matching the above two conditions will be passed to DEFAULT group. See sample mapping 14. Click OK -> Click Apply -> Click Ok. 15. Now connect the ports from router to target tables. 16. Click Mapping -> Validate 17. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table. Make sure to give connection information for all 3 target tables.

Sample Mapping:

Difference between Router and Filter : We cannot pass rejected data forward in filter but we can pass it in router. Rejected data is in Default Group of router.

LOOKUP TRANSFORMATION Passive Transformation Can be Connected or Unconnected. Dynamic lookup is connected. Use a Lookup transformation in a mapping to look up data in a flat file or a relational table, view, or synonym. We can import a lookup definition from any flat file or relational database to which both the PowerCenter Client and Server can connect. We can use multiple Lookup transformations in a mapping.

The Power Center Server queries the lookup source based on the lookup ports in the transformation. It compares Lookup transformation port values to lookup source column values based on the lookup condition. Pass the result of the lookup to other transformations and a target. We can use the Lookup transformation to perform following:

Get a related value: EMP has DEPTNO but DNAME is not there. We use Lookup to get DNAME from DEPT table based on Lookup Condition. Perform a calculation: We want only those Employees whos SAL > Average (SAL). We will write Lookup Override query. Update slowly changing dimension tables: Most important use. We can use a Lookup transformation to determine whether rows already exist in the target.

1. LOOKUP TYPES We can configure the Lookup transformation to perform the following types of lookups:

Connected or Unconnected Relational or Flat File Cached or Un cached

Relational Lookup: When we create a Lookup transformation using a relational table as a lookup source, we can connect to the lookup source using ODBC and import the table definition as the structure for the Lookup transformation.

We can override the default SQL statement if we want to add a WHERE clause or query multiple tables. We can use a dynamic lookup cache with relational lookups.

Flat File Lookup: When we use a flat file for a lookup source, we can use any flat file definition in the repository, or we can import it. When we import a flat file lookup source, the Designer invokes the Flat File Wizard. Cached or Un cached Lookup: We can check the option in Properties Tab to Cache to lookup or not. By default, lookup is cached. Connected and Unconnected Lookup Connected Lookup Unconnected Lookup

Receives input values directly from the pipeline. We can use a dynamic or static cache. Cache includes all lookup columns used in the mapping.

Receives input values from the result of a :LKP expression in another transformation. We can use a static cache. Cache includes all lookup/output ports in the lookup condition and the lookup/return port.

If there is no match for If there is no match for the the lookup condition, lookup condition, the Power the Power Center Server Center Server returns NULL. returns the default value for all output ports. If there is a match for If there is a match for the the lookup condition, lookup condition,the Power the Power Center Server Center Server returns the result returns the result of the of the lookup condition into the lookup condition for all return port. lookup/output ports. Pass multiple output values to another transformation. Supports user-defined default values Pass one output value to another transformation. Does not support user-defined default values.

2 .LOOKUP T/F COMPONENTS Define the following components when we configure a Lookup transformation in a mapping:

Lookup source Ports Properties Condition

1. Lookup Source: We can use a flat file or a relational table for a lookup source. When we create a Lookup t/f, we can import the lookup source from the following locations:

Any relational source or target definition in the repository Any flat file source or target definition in the repository Any table or file that both the Power Center Server and Client machine can connect to The lookup table can be a single table, or we can join multiple tables in the same database using a lookup SQL override in Properties Tab.

2. Ports: Ports Lookup Type Number Needed Description

Connected Unconnected

Minimum 1

Input port to Lookup. Usually ports used for Join

condition are Input ports. O Connected Unconnected Minimum 1 Ports going to another transformation from Lookup.

Minimum Lookup port. The 1 Designer Unconnected automatically Designates each column in the lookup source as a lookup (L) and output port (O). Unconnected 1 Only Return port. Use only in unconnected Lookup t/f only.

Connected

3. Properties Tab Options Lookup Type Description

Lookup SQL Override Lookup Table Name

Relational

Overrides the default SQL statement to query the lookup table. Specifies the name of the table from which the transformation looks up and caches values. Indicates whether the Power Center Server caches lookup values during the session. Determines what happens when the Lookup transformation

Relational

Lookup Caching Enabled Lookup Policy on Multiple Match

Flat File, Relational

Flat File, Relational

finds multiple rows that match the lookup condition. Options: Use First Value or Use Last Value or Use Any Value or Report Error Lookup Condition Connection Information Source Type Lookup Cache Directory Name Lookup Cache Persistent Dynamic Lookup Cache Recache From Lookup Source Flat File, Relational Relational Displays the lookup condition you set in the Condition tab. Specifies the database containing the lookup table. Lookup is from a database or flat file. Location where cache is build. Whether to use Persistent Cache or not. Whether to use Dynamic Cache or not. To rebuild cache if cache source changes and we are using Persistent Cache. Use only with dynamic caching enabled. Applies to rows entering the Lookup transformation with the row type of insert. Data Cache Size Index Cache Size Use only with

Flat File, Relational Flat File, Relational Flat File, Relational Flat File, Relational Flat File, Relational

Insert Else Update

Relational

Lookup Data Cache Size Lookup Index Cache Size Cache File

Flat File, Relational Flat File, Relational Flat File,

Name Prefix

Relational

persistent lookup cache. Specifies the file name prefix to use with persistent lookup cache files.

Some other properties for Flat Files are:


Date time Format Thousand Separator Decimal Separator Case-Sensitive String Comparison Null Ordering Sorted Input

4: Condition Tab We enter the Lookup Condition. The Power Center Server uses the lookup condition to test incoming values. We compare transformation input values with values in the lookup source or cache, represented by lookup ports.

The data types in a condition must match. When we enter multiple conditions, the Power Center Server evaluates each condition as an AND, not an OR. The Power Center Server matches null values. The input value must meet all conditions for the lookup to return a value. =, >, <, >=, <=, != Operators can be used. Example: IN_DEPTNO = DEPTNO

In_DNAME = 'DELHI' Tip: If we include more than one lookup condition, place the conditions with an equal sign first to optimize lookup performance. Note: 1. We can use = operator in case of Dynamic Cache. 2. The Power Center Server fails the session when it encounters multiple keys for a Lookup transformation configured to use a dynamic cache.

3. Connected Lookup Transformation Example: To create a connected Lookup Transformation


EMP will be source table. DEPT will be LOOKUP table. Create a target table CONN_Lookup_EXAMPLE in target designer. Table should contain all ports of EMP table plus DNAME and LOC as shown below. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give name. Ex: m_CONN_LOOKUP_EXAMPLE 4. Drag EMP and Target table. 5. Connect all fields from SQ_EMP to target except DNAME and LOC. 6. Transformation-> Create -> Select LOOKUP from list. Give name and click Create. 7. The Following screen is displayed. 8. As DEPT is the Source definition, click Source and then Select DEPT. 9. Click Ok.

10. Now Pass DEPTNO from SQ_EMP to this Lookup. DEPTNO from SQ_EMP will be named as DEPTNO1. Edit Lookup and rename it to IN_DEPTNO in ports tab. 11. Now go to CONDITION tab and add CONDITION.

DEPTNO = IN_DEPTNO and Click Apply and then OK. Link the mapping as shown below: 12. We are not passing IN_DEPTNO and DEPTNO to any other transformation from LOOKUP; we can edit the lookup transformation and remove the OUTPUT check from them. 13. Mapping -> Validate 14. Repository -> Save

Create Session and Workflow as described earlier. Run the workflow and see the data in target table. Make sure to give connection information for all tables. Make sure to give connection for LOOKUP Table also.

We use Connected Lookup when we need to return more than one column from Lookup table.There is no use of Return Port in Connected Lookup. SEE PROPERTY TAB FOR ADVANCED SETTINGS

4. Unconnected Lookup Transformation An unconnected Lookup transformation is separate from the pipeline in the mapping. We write an expression using the :LKP reference qualifier to call the lookup within another transformation. Steps to configure Unconnected Lookup: 1. 2. 3. 4. Add input ports. Add the lookup condition. Designate a return value. Call the lookup from another transformation.

Example: To create a unconnected Lookup Transformation

EMP will be source table. DEPT will be LOOKUP table.

Create a target table UNCONN_Lookup_EXAMPLE in target designer. Table should contain all ports of EMP table plus DNAME as shown below. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give name. Ex: m_UNCONN_LOOKUP_EXAMPLE 4. Drag EMP and Target table. 5. Now Transformation-> Create -> Select EXPRESSION from list. Give name and click Create. Then Click Done. 6. Pass all ports from SQ_EMP to EXPRESSION transformation. 7. Connect all fields from EXPRESSION to target except DNAME. 8. Transformation-> Create -> Select LOOKUP from list. Give name and click Create. 9. Follow the steps as in Connected above to create Lookup on DEPT table. 10. Click Ok. 11. Now Edit the Lookup Transformation. Go to Ports tab. 12. As DEPTNO is common in source and Lookup, create a port IN_DEPTNO ports tab. Make it Input port only and Give Datatype same as DEPTNO. 13. Designate DNAME as Return Port. Check on R to make it.

14. Now add a condition in Condition Tab. DEPTNO = IN_DEPTNO and Click Apply and then OK. 15. Now we need to call this Lookup from Expression Transformation. 16. Edit Expression t/f and create a new output port out_DNAME of data type as DNAME. Open the Expression editor and call Lookup as given below: We double click Unconn in bottom of Functions tab and as we need only DEPTNO, we pass only DEPTNO as input. 17. Validate the call in Expression editor and Click OK. 18. Mapping -> Validate 19. Repository Save.

Create Session and Workflow as described earlier. Run the workflow and see the data in target table.

Make sure to give connection information for all tables. Make sure to give connection for LOOKUP Table also.

5. Lookup Caches We can configure a Lookup transformation to cache the lookup table. The Integration Service (IS) builds a cache in memory when it processes the first row of data in a cached Lookup transformation. The Integration Service also creates cache files by default in the $PMCacheDir. If the data does not fit in the memory cache, the IS stores the overflow values in the cache files. When session completes, IS releases cache memory and deletes the cache files.

If we use a flat file lookup, the IS always caches the lookup source. We set the Cache type in Lookup Properties.

Lookup Cache Files 1. Lookup Index Cache:

Stores data for the columns used in the lookup condition.

2. Lookup Data Cache:


For a connected Lookup transformation, stores data for the connected output ports, not including ports used in the lookup condition. For an unconnected Lookup transformation, stores data from the return port.

Types of Lookup Caches: 1. Static Cache By default, the IS creates a static cache. It caches the lookup file or table and Looks up values in the cache for each row that comes into the transformation.The IS does not update the cache while it processes the Lookup transformation. 2. Dynamic Cache To cache a target table or flat file source and insert new rows or update existing rows in the cache, use a Lookup transformation with a dynamic cache. The IS dynamically inserts or updates data in the lookup cache and passes data to the target. Target table is also our lookup table. No good for performance if table is huge. 3. Persistent Cache If the lookup table does not change between sessions, we can configure the Lookup transformation to use a persistent lookup cache. The IS saves and reuses cache files from session to session, eliminating the time Required to read the lookup table. 4. Recache from Source

If the persistent cache is not synchronized with the lookup table, we can Configure the Lookup transformation to rebuild the lookup cache.If Lookup table has changed, we can use this to rebuild the lookup cache.

5. Shared Cache

Unnamed cache: When Lookup transformations in a mapping have compatible caching structures, the IS shares the cache by default. You can only share static unnamed caches. Named cache: Use a persistent named cache when we want to share a cache file across mappings or share a dynamic and a static cache. The caching structures must match or be compatible with a named cache. You can share static and dynamic named caches.

Building Connected Lookup Caches We can configure the session to build caches sequentially or concurrently.

When we build sequential caches, the IS creates caches as the source rows enter the Lookup transformation. When we configure the session to build concurrent caches, the IS does not wait for the first row to enter the Lookup transformation before it creates caches. Instead, it builds multiple caches concurrently.

1. Building Lookup Caches Sequentially:

2. Building Lookup Caches Concurrently:

To configure the session to create concurrent caches

Edit Session -> In Config Object Tab-> Additional Concurrent Pipelines for Lookup Cache Creation -> Give a value here (Auto By Default) Note: The IS builds caches for unconnected Lookups sequentially only

SEQUENCE GENERATOR T/F Passive and Connected Transformation. The Sequence Generator transformation generates numeric values. Use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.

We use it to generate Surrogate Key in DWH environment mostly. When we want to Maintain history, then we need a key other than Primary Key to uniquely identify the record. So we create a Sequence 1,2,3,4 and so on. We use this sequence as the key. Example: If EMPNO is the key, we can keep only one record in target and cant maintain history. So we use Surrogate key as Primary key and not EMPNO. Sequence Generator Ports : The Sequence Generator transformation provides two output ports: NEXTVAL and CURRVAL.

We cannot edit or delete these ports. Likewise, we cannot add ports to the transformation.

NEXTVAL: Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation or target. For example, we might connect NEXTVAL to two target tables in a mapping to generate unique primary key values.

Sequence in Table 1 will be generated first. When table 1 has been loaded, only then Sequence for table 2 will be generated. CURRVAL: CURRVAL is NEXTVAL plus the Increment By value.

We typically only connect the CURRVAL port when the NEXTVAL port is Already connected to a downstream transformation. If we connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service passes a constant value for each row. when we connect the CURRVAL port in a Sequence Generator Transformation, the Integration Service processes one row in each block. We can optimize performance by connecting only the NEXTVAL port in a Mapping.

Example: To use Sequence Generator transformation


EMP will be source. Create a target EMP_SEQ_GEN_EXAMPLE in shared folder. Structure same as EMP. Add two more ports NEXT_VALUE and CURR_VALUE to the target table. Create shortcuts as needed.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give name. Ex: m_seq_gen_example 4. Drag EMP and Target table. 5. Connect all ports from SQ_EMP to target table. 6. Transformation -> Create -> Select Sequence Generator for list -> Create -> Done 7. Connect NEXT_VAL and CURR_VAL from Sequence Generator to target. 8. Validate Mapping 9. Repository -> Save

Create Session and then workflow. Give connection information for all tables. Run workflow and see the result in table.

Sequence Generator Properties: Setting Start Value Required/Optional Required Description Start value of the

generated sequence that we want IS to use if we use Cycle option. Default is 0. Increment By Required Difference between two consecutive values from the NEXTVAL port. Maximum value the Integration Service generates. First value in the sequence.If cycle option used, the value must be greater than or equal to the start value and less the end value. If selected, the Integration Service cycles through the sequence range. Ex: Start Value:1 End Value 10 Sequence will be from 1-10 and again start from 1. By default, last value of sequence during session is saved to repository. Next time the sequence is started from the valued saved. If selected, the Integration Service

End Value

Optional

Current Value

Optional

Cycle

Optional

Reset

Optional

generates values based on the original current value for each session.

Points to Ponder:

If Current value is 1 and end value 10, no cycle option. There are 17 records in source. In this case session will fail. If we connect just CURR_VAL only, the value will be same for all records. If Current value is 1 and end value 10, cycle option there. Start value is 0. There are 17 records in source. Sequence: 1 2 10. 0 1 2 3 To make above sequence as 1-10 1-20, give Start Value as 1. Start value is used along with Cycle option only. If Current value is 1 and end value 10, cycle option there. Start value is 1. There are 17 records in source. Session runs. 1-10 1-7. 7 will be saved in repository. If we run session again, sequence will start from 8. Use reset option if you want to start sequence from CURR_VAL every time.

SORTER TRANSFORMATION Connected and Active Transformation The Sorter transformation allows us to sort data. We can sort data in ascending or descending order according to a specified sort key.

We can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct.

When we create a Sorter transformation in a mapping, we specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order. We also configure sort criteria the Power Center Server applies to all sort key ports and the system resources it allocates to perform the sort operation. The Sorter transformation contains only input/output ports. All data passing through the Sorter transformation is sorted according to a sort key. The sort key is one or more ports that we want to use as the sort criteria. Sorter Transformation Properties 1. Sorter Cache Size: The Power Center Server uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. The Power Center Server passes all incoming data into the Sorter transformation Before it performs the sort operation.

We can specify any amount between 1 MB and 4 GB for the Sorter cache size. If it cannot allocate enough memory, the Power Center Server fails the Session. For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Power Center Server machine. Informatica recommends allocating at least 8 MB of physical memory to sort data using the Sorter transformation.

2. Case Sensitive: The Case Sensitive property determines whether the Power Center Server considers case when sorting data. When we enable the Case Sensitive property, the Power Center Server sorts uppercase characters higher than lowercase characters. 3. Work Directory Directory Power Center Server uses to create temporary files while it sorts data. 4. Distinct: Check this option if we want to remove duplicates. Sorter will sort data according to all the ports when it is selected.

Example: Sorting data of EMP by ENAME

Source is EMP table.

Create a target table EMP_SORTER_EXAMPLE in target designer. Structure same as EMP table. Create the shortcuts in your folder.

Creating Mapping: 1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer. 3. Click Mapping-> Create-> Give mapping name. Ex: m_sorter_example 4. Drag EMP from source in mapping. 5. Click Transformation -> Create -> Select Sorter from list. Give name and click Create. Now click done. 6. Pass ports from SQ_EMP to Sorter Transformation. 7. Edit Sorter Transformation. Go to Ports Tab 8. Select ENAME as sort key. CHECK mark on KEY in front of ENAME. 9. Click Properties Tab and Select Properties as needed. 10. Click Apply -> Ok. 11. Drag target table now. 12. Connect the output ports from Sorter to target table. 13. Click Mapping -> Validate 14. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table. Make sure to give connection information for all tables.

Sample Sorter Mapping :

Performance Tuning: Sorter transformation is used to sort the input data. 1. While using the sorter transformation, configure sorter cache size to be larger than the input data size. 2. Configure the sorter cache size setting to be larger than the input data size while Using sorter transformation. 3. At the sorter transformation, use hash auto keys partitioning or hash user keys Partitioning.

SQL Transformation: You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors. When you create an SQL transformation, you configure the following options: Mode:-The SQL transformation runs in one of the following modes:

Script mode. The SQL transformation runs ANSI SQL scripts that are externally located. You pass a script name to the transformation with each input row. The SQL transformation outputs one row for each input row. Query mode. The SQL transformation executes a query that you define in a query editor. You can pass strings or parameters to the query to define dynamic queries or change the selection parameters. You can output multiple rows when the query has a SELECT statement. Passive or active transformation. The SQL transformation is an active transformation by default. You can configure it as a passive transformation when you create the transformation. Database type. The type of database the SQL transformation connects to.

Connection type. Pass database connection information to the SQL transformation or use a connection object.

Script Mode An SQL transformation running in script mode runs SQL scripts from text files. You pass each script file name from the source to the SQL transformation Script Name port. The script file name contains the complete path to the script file. When you configure the transformation to run in script mode, you create a passive transformation. The transformation returns one row for each input row. The output row contains results of the query and any database error. Rules and Guidelines for Script Mode Use the following rules and guidelines for an SQL transformation that runs in script mode:

You can use a static or dynamic database connection with script mode. To include multiple query statements in a script, you can separate them with a semicolon. You can use mapping variables or parameters in the script file name. The script code page defaults to the locale of the operating system. You can change the locale of the script. The script file must be accessible by the Integration Service. The Integration Service must have read permissions on the directory that contains the script. The Integration Service ignores the output of any SELECT statement you include in the SQL script. The SQL transformation in script mode does not output more than one row of data for each input row. You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in the script. You cannot use nested scripts where the SQL script calls another SQL script. A script cannot accept run-time arguments.

Query Mode

When you configure the SQL transformation to run in query mode, you create an active transformation. When an SQL transformation runs in query mode, it executes an SQL query that you define in the transformation. You pass strings or parameters to the query from the transformation input ports to change the query statement or the query data.

You can create the following types of SQL queries in the SQL transformation:

Static SQL query. The query statement does not change, but you can use query parameters to change the data. The Integration Service prepares the query once and runs the query for all input rows. Dynamic SQL query. You can change the query statements and the data. The Integration Service prepares a query for each input row.

Rules and Guidelines for Query Mode Use the following rules and guidelines when you configure the SQL transformation to run in query mode:

The number and the order of the output ports must match the number and order of the fields in the query SELECT clause. The native data type of an output port in the transformation must match the data type of the corresponding column in the database. The Integration Service generates a row error when the data types do not match.

When the SQL query contains an INSERT, UPDATE, or DELETE clause, the transformation returns data to the SQL Error port, the pass-through ports, and the Num Rows Affected port when it is enabled. If you add output ports the ports receive NULL data values. When the SQL query contains a SELECT statement and the transformation has a pass-through port, the transformation returns data to the pass-through port whether or not the query returns database data. The SQL transformation returns a row with NULL data in the output ports. You cannot add the "_output" suffix to output port names that you create. You cannot use the pass-through port to return data from a SELECT query. When the number of output ports is more than the number of columns in the SELECT clause, the extra ports receive a NULL value. When the number of output ports is less than the number of columns in the SELECT clause, the Integration Service generates a row error. You can use string substitution instead of parameter binding in a query. However, the input ports must be string data types.

SQL Transformation Properties After you create the SQL transformation, you can define ports and set attributes in the following transformation tabs:

Ports. Displays the transformation ports and attributes that you create on the SQL Ports tab. Properties. SQL transformation general properties. SQL Settings. Attributes unique to the SQL transformation. SQL Ports. SQL transformation ports and attributes.

Note: You cannot update the columns on the Ports tab. When you define ports on the SQL Ports tab, they display on the Ports tab. Properties Tab Configure the SQL transformation general properties on the Properties tab. Some transformation properties do not apply to the SQL transformation or are not configurable. The following table describes the SQL transformation properties:

Property Run Time Location

Description Enter a path relative to the Integration Service node that runs the SQL transformation session. If this property is blank, the Integration Service uses the environment variable defined on the Integration Service node to locate the DLL or shared library.

You must copy all DLLs or shared libraries to the runtime location or to the environment variable defined on the Integration Service node. The Integration Service fails to load the procedure when it cannot locate the DLL, shared library, or a referenced file. Tracing Level Sets the amount of detail included in the session log when you run a session containing this transformation. When you configure the SQL transformation tracing level to Verbose Data, the Integration Service writes each SQL query it prepares to the session log. Multiple partitions in a pipeline can use this transformation. Use the following options: - No. The transformation cannot be partitioned. The transformation and other transformations in the same pipeline are limited to one partition. You might choose No if the transformation processes all the input data together, such as data cleansing. - Locally. The transformation can be partitioned, but the Integration Service must run all partitions in the pipeline on the same node. Choose Locally when different

Is Partition able

partitions of the transformation must share objects in memory. - Across Grid. The transformation can be partitioned, and the Integration Service can distribute each partition to different nodes. Default is No. Update Strategy The transformation defines the update strategy for output rows. You can enable this property for query mode SQL transformations. Default is disabled. Transformation Scope The method in which the Integration Service applies the transformation logic to incoming data. Use the following options: - Row - Transaction - All Input Set transaction scope to transaction when you use transaction control in static query mode. Default is Row for script mode transformations.Default is All Input for query mode transformations. Output is Repeatable Indicates if the order of the output data is consistent between session runs. - Never. The order of the

output data is inconsistent between session runs. - Based On Input Order. The output order is consistent between session runs when the input data order is consistent between session runs. - Always. The order of the output data is consistent between session runs even if the order of the input data is inconsistent between session runs. Default is Never. Generate Transaction The transformation generates transaction rows. Enable this property for query mode SQL transformations that commit data in an SQL query. Default is disabled. Indicates if the Integration Service processes each Thread Per Partition partition of a procedure with one thread. Output is Deterministic The transformation generate consistent output data between session runs. Enable this property to perform recovery on sessions that use this transformation. Default is enabled. Requires Single

Create Mapping : Step 1: Creating a flat file and importing the source from the flat file.

Create a Notepad and in it create a table by name bikes with three columns and three records in it. Create one more notepad and name it as path for the bikes. Inside the Notepad just type in (C:\bikes.txt) and save it.

Import the source (second notepad) using the source->import from the file. After which we are goanna get a wizard with three subsequent windows and follow the on screen instructions to complete the process of importing the source.

Step 2: Importing the target and applying the transformation. In the same way as specified above go to the targets->import from file and select an empty notepad under the name targetforbikes (this is one more blank notepad which we should create and save under the above specified name in the C :\).

Create two columns in the target table under the name report and error. We are all set here. Now apply the SQL transformation. In the first window when you apply the SQL transformation we should select the script mode. Connect the SQ to the ScriptName under inputs and connect the other two fields to the output correspondingly.

Snapshot for the above discussed things is given below.

Step 3: Design the work flow and run it.


Create the task and the work flow using the naming conventions. Go to the mappings tab and click on the Source on the left hand pane to specify the path for the output file.

Step 4: Preview the output data on the target table.

Update Strategy Transformation Active and Connected Transformation

Till now, we have only inserted rows in our target tables. What if we want to update, delete or reject rows coming from source based on some condition? Example: If Address of a CUSTOMER changes, we can update the old address or keep both old and new address. One row is for old and one for new. This way we maintain the historical data. Update Strategy is used with Lookup Transformation. In DWH, we create a Lookup on target table to determine whether a row already exists or not. Then we insert, update, delete or reject the source record as per business need.

In Power Center, we set the update strategy at two different levels: 1. Within a session 2. Within a Mapping 1. Update Strategy within a session: When we configure a session, we can instruct the IS to either treat all rows in the same way or use instructions coded into the session mapping to flag rows for different database operations. Session Configuration: Edit Session -> Properties -> Treat Source Rows as: (Insert, Update, Delete, and Data Driven). Insert is default. Specifying Operations for Individual Target Tables:

You can set the following update strategy options: Insert: Select this option to insert a row into a target table. Delete: Select this option to delete a row from a table. Update: We have the following options in this situation:

Update as Update. Update each row flagged for update if it exists in the target table. Update as Insert. Inset each row flagged for update. Update else Insert. Update the row if it exists. Otherwise, insert it.

Truncate table: Select this option to truncate the target table before loading data. 2. Flagging Rows within a Mapping

Within a mapping, we use the Update Strategy transformation to flag rows for insert, delete, update, or reject. Operation INSERT UPDATE DELETE REJECT Constant DD_INSERT DD_UPDATE DD_DELETE DD_REJECT Numeric Value 0 1 2 3

Update Strategy Expressions: Frequently, the update strategy expression uses the IIF or DECODE function from the transformation language to test each row to see if it meets a particular condition. IIF( ( ENTRY_DATE > APPLY_DATE), DD_REJECT, DD_UPDATE ) Or IIF( ( ENTRY_DATE > APPLY_DATE), 3, 2 )

The above expression is written in Properties Tab of Update Strategy T/f. DD means DATA DRIVEN

Forwarding Rejected Rows: We can configure the Update Strategy transformation to either pass rejected rows to the next transformation or drop them. Steps: 1. 2. 3. 4. Create Update Strategy Transformation Pass all ports needed to it. Set the Expression in Properties Tab. Connect to other transformations or target.

Performance tuning: 1. Use Update Strategy transformation as less as possible in the mapping. 2. Do not use update strategy transformation if we just want to insert into target table, instead use direct mapping, direct filtering etc. For updating or deleting rows from the target table we can use Update Strategy transformation itself.

You might also like