You are on page 1of 9

Q. What type of repositories can be created using Informatica Repository Manager?

A. Informatica PowerCenter includeds following type of repositories :


 Standalone Repository : A repository that functions individually and this is unrelated to any other
repositories.
 Global Repository : This is a centralized repository in a domain. This repository can contain shared objects
across the repositories in a domain. The objects are shared through global shortcuts.
 Local Repository : Local repository is within a domain and it’s not a global repository. Local repository can
connect to a global repository using global shortcuts and can use objects in it’s shared folders.
 Versioned Repository : This can either be local or global repository but it allows version control for the
repository. A versioned repository can store multiple copies, or versions of an object. This features allows
to efficiently develop, test and deploy metadata in the production environment.
Q. What is a code page?
A. A code page contains encoding to specify characters in a set of one or more languages. The code page is selected
based on source of the data. For example if source contains Japanese text then the code page should be selected to
support Japanese text.
When a code page is chosen, the program or application for which the code page is set, refers to a specific set of data
that describes the characters the application recognizes. This influences the way that application stores, receives, and
sends character data.
Q. Which all databases PowerCenter Server on Windows can connect to?
A. PowerCenter Server on Windows can connect to following databases:
 IBM DB2
 Informix
 Microsoft Access
 Microsoft Excel
 Microsoft SQL Server
 Oracle
 Sybase
 Teradata
Q. Which all databases PowerCenter Server on UNIX can connect to?
A. PowerCenter Server on UNIX can connect to following databases:
 IBM DB2
 Informix
 Oracle
 Sybase
 Teradata
Infomratica Mapping Designer
Q. How to execute PL/SQL script from Informatica mapping?
A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL
procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure.
Q. How can you define a transformation? What are different types of transformations available in
Informatica?
A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of
transformations that perform specific functions. For example, an Aggregator transformation performs calculations
on groups of data. Below are the various transformations available in Informatica:
 Aggregator
 Application Source Qualifier
 Custom
 Expression
 External Procedure
 Filter
 Input
 Joiner
 Lookup
 Normalizer
 Output
 Rank
 Router
 Sequence Generator
 Sorter
 Source Qualifier
 Stored Procedure
 Transaction Control
 Union
 Update Strategy
 XML Generator
 XML Parser
 XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when
it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source
Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The
default query is SELET statement containing all the source columns. Source Qualifier has capability to override this
default query by changing the default settings of the transformation properties. The list of selected ports or the order
they appear in the default query should not be changed in overridden query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike
Expression Transformation, the Aggregator transformation can only be used to perform calculations on groups. The
Expression transformation permits calculations on a row-by-row basis only.
Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the
aggregator transformation outputs the last row of each group unless otherwise specified in the transformation
properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN,
PERCENTILE, STDDEV, SUM, VARIANCE.
Q. What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental
Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data
through the mapping and uses historical cache data to perform new aggregation calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can be used to merge data from various
sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine
result set of two SELECT statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file sources.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup
transformation ports (input ports) to the source column values based on the lookup condition. Later returned values
can be passed to other transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is the difference between a connected look up and unconnected look up?
A. Connected lookup takes input values directly from other transformations in the pipleline.
Unconnected lookup doesn’t take inputs directly from any other transformation, but it can be used in any
transformation (like expression) and can be invoked as a function using :LKP expression. So, an unconnected
lookup can be called multiple times in a mapping.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations
and it allows us to reuse that transformation logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored as a
metadata separate from any other mapping that uses the transformation. Whenever any changes to a reusable
transformation are made, all the mappings where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target
table. If the row has to be updated/inserted based on some logic Update Strategy transformation is used. The
condition can be specified in Update Strategy to mark the processed row for update or insert.
Following options are available for update strategy :
 DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of
DD_INSERT is 0.
 DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric value of
DD_UPDATE is 1.
 DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of
DD_DELETE is 2.
 DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of
DD_REJECT is 3.
 Types of Dimensions
 Confirmed Dimensions, Junk Dimensions, and Degenerated Dimensions

Conformed Dimensions (CD): these dimensions are something that is built once in your model
and can be reused multiple times with different fact tables. For example, consider a model
containing multiple fact tables, representing different data marts. Now look for a dimension
that is common to these facts tables. In this example let’s consider that the product dimension
is common and hence can be reused by creating short cuts and joining the different fact
tables.Some of the examples are time dimension, customer dimensions, product dimension.

Junked Dimensions (JD): When you consolidate lots of small dimensions and instead of having
100s of small dimensions, that will have few records in them, cluttering your database with
these mini ‘identifier’ tables, all records from all these small dimension tables are loaded into
ONE dimension table and we call this dimension table Junk dimension table. (Since we are
storing all the junk in this one table) For example: a company might have handful of
manufacture plants, handful of order types, and so on, so forth, and we can consolidate them
in one dimension table called junked dimension table.
Degenerated Dimension (DD): An item that is in the fact table but is stripped off of its
description, because the description belongs in dimension table, is referred to as Degenerated
Dimension. Since it looks like dimension, but is really in fact table and has been degenerated of
its description, hence is called degenerated dimension. Now coming to the slowly changing
dimensions (SCD) and Slowly Growing Dimensions (SGD): I would like to classify them to be
more of an attributes of dimensions its self. 

Although other might disagree to this view but Slowly Changing Dimensions are basically those
dimensions whose key value will remain static but description might change over the period of
time. For example, the product id in a companies, product line might remain the same, but the
description might change from time to time, hence, product dimension is called slowly
changing dimension. 

Lets consider a customer dimension, which will have a unique customer id but the customer
name (company name) might change periodically due to buy out / acquisitions, Hence, slowly
changing dimension, as customer number is static but customer name is changing, However, on
the other hand the company will add more customers to its existing list of customers and it is
highly unlikely that the company will acquire astronomical number of customer over night
(wouldn’t the company CEO love that) hence, the customer dimension is both a Slowly
changing as well as slowly growing dimension.
 POSTED BY MANJULA AT 9:01 AM 41 COMMENTS 

 TUESDAY, AUGUST 14, 2007

 How do u identify the bottlenecks in Mappings?


 Bottlenecks can occur in

A. Targets The most common performance bottleneck occurs when the informatica server
writes to a target database. You can identify target bottleneck by configuring the session to
write to a flat file target. If the session performance increases significantly when you write to a
flat file, you have a target bottleneck. 

Solution : Drop or Disable index or constraints Perform bulk load (Ignores Database log)
Increase commit interval (Recovery is compromised) Tune the database for RBS, Dynamic
Extension etc.,
B. Sources Set a filter transformation after each SQ and see the records are not through. If the
time taken is same then there is a problem. You can also identify the Source problem by Read
Test Session – where we copy the mapping with sources, SQ and remove all transformations and
connect to file target. If the performance is same then there is a Source bottleneck. Using
database query – Copy the read query directly from the log. Execute the query against the
source database with a query tool. If the time it takes to execute the query and the time to
fetch the first row are significantly different, then the query can be modified using optimizer
hints. 

Solutions: Optimize Queries using hints. Use indexes wherever possible.

C. Mapping 

If both Source and target are OK then problem could be in mapping. Add a filter transformation
before target and if the time is the same then there is a problem. 

(OR) Look for the performance monitor in the Sessions property sheet and view the counters.

Solutions: If High error rows and rows in lookup cache indicate a mapping bottleneck. Optimize
Single Pass Reading: 

Optimize Lookup transformation :

1. Caching the lookup table: 


When caching is enabled the informatica server caches the lookup table and queries the cache
during the session. When this option is not enabled the server queries the lookup table on a
row-by row basis. 
Static, Dynamic, Shared, Un-shared and Persistent cache 

2. Optimizing the lookup condition whenever multiple conditions are placed, the condition with
equality sign should take precedence. 

3. Indexing the lookup table 


The cached lookup table should be indexed on order by columns.
The session log contains the ORDER BY statement, the un-cached lookup since the server issues
a SELECT statement for each row passing into lookup transformation, it is better to index the
lookup table on the columns in the condition 

Optimize Filter transformation: 

You can improve the efficiency by filtering early in the data flow. Instead of using a filter
transformation halfway through the mapping to remove a sizable amount of data. 
Use a source qualifier filter to remove those same rows at the source, If not possible to move
the filter into SQ, move the filter transformation as close to the source qualifier as possible to
remove unnecessary data early in the data flow. 

Optimize Aggregate transformation: 

1. Group by simpler columns. Preferably numeric columns. 


2. Use Sorted input. The sorted input decreases the use of aggregate caches. The server
assumes all input data are sorted and as it reads it performs aggregate calculations. 
3. Use incremental aggregation in session property sheet. 

Optimize Seq. Generator transformation: 

1. Try creating a reusable Seq. Generator transformation and use it in multiple mappings 2.
The number of cached value property determines the number of values the informatica server
caches at one time. 

Optimize Expression transformation: 


1. Factoring out common logic 
2. Minimize aggregate function calls. 
3. Replace common sub-expressions with local variables. 
4. Use operators instead of functions.
D. Sessions 

If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.
You can identify a session bottleneck by using the performance details. The informatica server
creates performance details when you enable Collect Performance Data on the General Tab of
the session properties. Performance details display information about each Source Qualifier,
target definitions, and individual transformation. All transformations have some basic counters
that indicate the Number of input rows, output rows, and error rows. Any value other than zero
in the readfromdisk and writetodisk counters for Aggregate, Joiner, or Rank transformations
indicate a session bottleneck. 

Low bufferInput_efficiency and BufferOutput_efficiency counter also indicate a session


bottleneck. Small cache size, low buffer memory, and small commit intervals can cause session
bottlenecks.

E. System (Networks)
 Dataware House Basic Concepts
 What is a Data Warehouse?
A Data Warehouse is the "corporate memory". Academics will say it is a subject oriented, point-
in-time, inquiry only collection of operational data.
Typical relational databases are designed for on-line transactional processing (OLTP) and do
not meet the requirements for effective on-line analytical processing (OLAP). As a result, data
warehouses are designed differently than traditional relational databases.

What is ETL?
ETL is the Data Warehouse acquisition processes of Extracting, Transforming (or Transporting)
and Loading (ETL) data from source systems into the data warehouse.
What is the difference between a data warehouse and a data mart?
There are inherent similarities between the basic constructs used to design a data warehouse
and a data mart. In general a Data Warehouse is used on an enterprise level, while Data Marts
is used on a business division/department level. A data mart only contains the required subject
specific data for local analysis.

What is the difference between a W/H and an OLTP application?


Typical relational databases are designed for on-line transactional processing (OLTP) and do
not meet the requirements for effective on-line analytical processing (OLAP). As a result, data
warehouses are designed differently than traditional relational databases.
Warehouses are Time Referenced, Subject-Oriented, Non-volatile (read only) and Integrated.
OLTP databases are designed to maintain atomicity, consistency and integrity (the "ACID"
tests). Since a data warehouse is not updated, these constraints are relaxed.

What is the difference between OLAP, ROLAP, MOLAP and HOLAP?


ROLAP, MOLAP and HOLAP are specialized OLAP (Online Analytical Analysis) applications.
ROLAP stands for Relational OLAP. Users see their data organized in cubes with dimensions, but
the data is really stored in a Relational Database (RDBMS) like Oracle. The RDBMS will store
data at a fine grain level, response times are usually slow.
MOLAP stands for Multidimensional OLAP. Users see their data organized in cubes with
dimensions, but the data is store in a Multi-dimensional database (MDBMS) like Oracle Express
Server. In a MOLAP system lot of queries have a finite answer and performance is usually
critical and fast.

HOLAP stands for Hybrid OLAP, it is a combination of both worlds.

What is the difference between an ODS and a W/H?


An ODS (Operational Data Store) is an integrated database of operational data. Its sources
include legacy systems and it contains current or near term data. An ODS may contain 30 to 90
days of information.

A warehouse typically contains years of data (Time Referenced). Data warehouses group data
by subject rather than by activity (subject-oriented). Other properties are: Non-volatile (read
only) and Integrated.

When should one use an MD-database (multi-dimensional database) and not a relational one?
Data in a multi-dimensional database is stored as business people views it, allowing them to
slice and dice the data to answer business questions. When designed correctly, an OLAP
database will provide must faster response times for analytical queries.

Normal relational databases store data in two-dimensional tables and analytical queries against
them are normally very slow.

How can Oracle Materialized Views be used to speed up data warehouse queries?
With "Query Rewrite" (QUERY_REWRITE_ENABLED=TRUE in INIT.ORA) Oracle can direct queries
to use pre-aggregated tables instead of scanning large tables to answer complex queries.
Materialized views in a W/H environments is typically referred to as summaries, because they
store summarized data.

You might also like