Professional Documents
Culture Documents
ANS:
and
Admin Console-- Its A web based client tool n administrating the all informatica client tools such as
Repository Manager,Desiger etc. And Administrating Users and Groups.
Responsible for creating Repository services and Integration Services and Access by Administrator.
------------------------------------------------------------------------------------------------------------------------------------------
2) can any one explain clearly what do u mean by dirty dimension and junk dimension with
example?
ANS:
Junk Dim:
Dimension table containing flags, gender, text,…. Which is not useful to generate reports? Then we
say this is junk dimension.
Dirty Dim:
3) I need your help for faster loading from flat file to oracle table
ANS:
1)Use Expernal Loading Concept in informatic at session level.For this u need configure "SQL Loader".
2)Split ur single Flat-File in unix around 10 Files based on the size and Enable Partition in session and
select target load type "Bulk"
(or)
I know there is feature sqlloader in informatica..., there is anything i have to do at db level .. i meant
table at db as currently table is at Oracle database..
4) Can anybody explain the difference between
1) session variable and mapping variable.
2)Session parameter and mapping parameter.
ANS:
session variable:
Yes they are same But suppose if we define Same variable in Session and mapping with some value
then Informatica will take Value which has been defined in Session
mapping variable:
mapping variable represent value that can be changed during mapping run
mapping variable is reqiresd to define a incremantal extratoin and source qualifier t/r to per form
incremental extraction
5) 1)how to load first half records to one target and remaining half records to another target
3)der r 2 workflows wkf1,wkf2 if wkf1 sucess then only wkf2 executed how
4)what is vi editor
(I) No it cannot. It will can retrun only one value. Google it you will get better answers.
(II) Any Lookup can't return more than one value.
if lookup find multiple matches there are few options u can select
1) custom
2) first
3) Last
In Expression u can split that field again into two fields and use it as two separate fields.
(III) Using Unconnected lookup also we can return Multilpe Column Values . Strictly Speaking
Un connected Lookup Returns only one Value.So here i am concating all the column values
using the || operator and returning this column from lookupup.So after that i am spliting
the entire column into multiple values using Substr and Instr.
Note : Enable the Lookup override : select name|| '~' || sal ||'~' || loc|| '~' ||Dname from
emp (Give the dateype as string and Lenght in the Lookup as 10000
for eno it retruns all the columns based on '~' we will split the single column into multiple
columns
7) How to load the filename of the flat file into the session statisitc table that is audit table
(I) Which flat file name do you want to upload. I mean is it a source or a target? or anything
else
(II) From Informatica 8 Onwords its ver Simple.
this means after session1 is completed i will be running session2 (this is to load into audit
table)
Here I am thinking how can we get 'Currently add Processing File Name" from the
session1 into session2 .
------------------------------------------------------------------------------------------------------------------
8) Joiner and look up t\r which one gives better performance ?
1)how can we tell particular t/r will give good performance (lookup, joiner) ?
Ans) my answer is both are same but insted of using multiple transformations we will use
mapplet, but mapplet will not show the transformations....
Both had its own advantages and disadvantages. based on requirement performance differs.
Joiner will take master table into cache by default. It should use the cache for joiner.
Lookup also uses cache but u can minimize the cache usage in lookup and u can write sql override in
lookup filter only required fields and records by mentioned some conditions in override.
2) Instead of multiple transformation we won't use mapplets.. Mapplets are prefered to use repeated
logic and re use that mapplets in different mappints. Performance won't differ by changing multiple
trans into mapplets.
(II) Lookup is passive t/r that means it processes all the records, whether condition is true or false
whatever it may be it processes all the records( if the condition is true it sends corresponding value
and if the condition is false sends null ) where as Joiner is active and it processes the records which
are satisfied by the condition. Joiner is nice performer here.
but we can't say which one is good that depends upon the requirement.
------------------------------------------------------------------------------------------------------------
9) 1)how to load first half records to one target and remaining half records to another target
3)der r 2 workflows wkf1,wkf2 if wkf1 sucess then only wkf2 executed how
4)what is vi editor
-----------------------------------------------------------------------------------------------------------------
i/p2: e
f
g
h
ANS:
what i am asking is i/p 1 and 2 are different files or two ports are in same file ?
Sourc1---SQ---->SEQ--->
Souce2--SQ---SEQ-------> Joiner [It will Join the 2 pilenines into one(Condition is
Source1-->SEQ=Source2-->SEQ)--->Load the records into target
Here, I am generating a seqence numbers for the first file and second file. So based on the
seq numbers i am joining the two files.
10) please anybody explain unix commands which are frequently in Informatica development.
ANS:
Regarding UNIX Usage in Informatica:
First of all .. In Informatica development projects Unix is not mandatory. Some projects use
windows based environment. You know need to know unix commands when ur working in
Windows environment.
Even if you are working in Unix environment. If Sources are databases and target database
then u hardly use Unix environment.
UNIX Basic commands using in dev
11) how to split one half to one target and another half to another target using unconnected
lookup how to do this can any one explain me.....
14) how can i lookup the data without using lookup transformation?
15) table A:
col1
bangalore#chennai#Hyd
Delhi#bombay
Required format
ANS:
i have flat file, the content of file is huge and it contains lot of unwanted data...
and we should not use any transformations. and source is connected to target it is one to one
mapping only,
how to remove the unwanted data and how can we achieve this...........please help me
ANS:
(I) 1) Please some filter condition in the source qualifier like Deptno = 10
2) Use some pre UNIX scripts to validate the flat file and copy those records from
the flat file to the other file and then trigger the workflow
(II) can you please tell me where we have to implement unix script
(III) In the informatica server it self
Please move/copy your file into a particular directory of unix server where
Informatica is installed and from there using unix script do all the data validation
and format correction then trigger your Informatica job
(IV) in sq trns write a query in sql override, user defined join, source filter and select
distinct
we can restrict the unwanted data.
17) i have two employee tables with the same structures but with differnet data , now i want to find
the max(sal) from two tables and needs to compare tables like(max(sal) of emp1 >max(sal) of emp2)
if emp1 having the max(sal) i need to pass one set of records and emp2 has max(sal) i need to pass
another set of records.
please help me asap.
ANS:
1. associated port is default port when we are enabling the dynamic cache in the lookup trans. it
is compared with input ports. and compared result will give the "NewLookupRow" port.
2. data validation means when source is coming to the staging area we have to validate the
i.e. we have 4 techniques.
3. versioning is good concept of informatica, i.e. group of members accessing the same
repository, when u want to restrict to access ur mappings from others use version controlling.
there we have 2 options checkin, checkout.
checkin used for to the mapping. in this informatica server will give one version number starts
with 1.
checkout used for edit mapping. applying the some changes.
19) Do you have any idea about LRF ( Load Ready File) in Informatica?
ANS:
(I) LRF is "Indicator File".Using Even Wait Task in informatica, we will wait for the file to
trigger the Jobs/Sessions.
(II) I think LRF is another concept
bcoz, "indicator file" maintains the id's and it is one of the session log statistics.
one type of output file.
indicator file will give each every record different id's.
weather it is inserts means give one id
updates means give another id
like that.
20) Which transformation is used instead of lookup to check whether the record alredy exist or
not???
ANS:
(I) Stored Procedure
(II) Joiner tm
ANS:
@shashank
No need of Union transformation here......
(III) taking 3 instances of target is working but full records are not comming, i have removed
the p.k also when i took union t/r they are getting successfully loaded with 3 duplicates of
every record.
Its defined by the informatica developers depends on the logic. For mapping
parameter and mapping variables are using the user defined variables.
System Variable:
---------------------
It denoted by trible dollar ($$$)
Example : $$$Sessionsystime
@Chocsweet: Pls chk the things above, wht u r saying is not the exact answer.
(III) if $ is used for predefined variable, then session parameters come under
predefined variables???
$session parameter
$Inputfile
----------------------------------------------------------------------------------------------------
2. wt is pipeline partition.
ANS: THROUGHPUT is nothing but the speed that informatica server reads the data from source and
writs the data to the target /sec......... it wil be displayed when u double click on session in workflow
monitor , it will show a window of the speed that inf.server read and writes. check it out .........
ANS:
1.sql t.r will supports the ddl and dml commands, where as other t.r not supported to ddl commands.
2. when u want to create a table dynamically (i.e. while session is running).
3. that is on depends on LLd preparations. then we want to use sql t.r mandatory.
and also depends on client requirements.
ANS
There are different situations mappings or workflows are locked by other users.
1) Repository is disconnected when u r working on any of the mappings. If you reconnect to repository
and if you open the mapping which you are working on sometimes it will says mappings is locked by
User(Pavan)
2) If you are not closed the designer or workflow manager is not disconnected properly. If you directly
close the window. Locks will be on corresponding mapping or workflow.
3) if two persons are working on the same mapping / workflow. second person will prompt a msg "
m_test mapping is locked by user(pavan).
If you want to release the locks u have to go to repository manager check user locks and u can
release the locks.
(III) Its a Write intent lock, if some user has opened it.
this lock can be released by disconnecting the respective connection id(user specific)
for that object in Admin console
(IV) I think from Infa 8.6.1 onwards this activity is pushed to Admin console to manage.
But earlier version all the locks can be managed by Repositry Manager.
ANS:
(I) i think normalizer will not work here, may be it will be acheaved by using aggregator
t/r but i dont know exactly,
(II) i think Normalizer T/R will giv us the Input in this example as its output when the
output of this particular example is used as the Normalizers Input
27) in which scenario we go for mapplets?pls tell me in real time with example
ANS:
(I) Definition and Limits
Mapplets
When the server runs a session using a mapplets, it expands the mapplets. The server then runs
the session as it would any other sessions, passing data through each transformations in the mapplet.
If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and
every mapping using the mapplet.
Mapplet Objects:
(a) Input transformation
(b) Source qualifier
(c) Transformations, as you need
(d) Output transformation
- Joiner
- Normalizer
- Pre/Post session stored procedure
- Target definitions
- XML source definitions
Types of Mapplets:
Copied mapplets are not an instance of original mapplets. If you make changes to the original, the
copy does not inherit your changes
You can use a single mapplet, even more than once on a mapping.
Ports
Example
In one our my projects.. we have error stretagy which is applicable for all the mappings..
That will capture the error records and flag them and attach error msg to the error records and writing
to the error table. this the logic common for all the mappings.. so we implemented this logic
in MAPPLETS
this example any one and any domain can tell as example of mapplet..
We are routing the error records based on the key coloumns null or null values.. what are the
scenarios we thought errors.. we capture those as errors and sending the only error records to the
Mapplet input..
In Mapplet, we are converting single row into multiple rows if that records contains more then one
error.. example one row had three errors we are creating three records out of it and showing three
errors for the same record.
In Mapplet..
actuall mapplet will do only segregate the errors and assign the error code and lookup on error msg
table based on the error code and assign the error messages to the each error and also error count for
each record in expressoin based on tat we will mention occurs option in normalizer to split single row
into multiple rows.
Its out of mapplet.. i'm just explaning till the last to load data into error table
ater that flow will go to mapplet out---> Normalizer --> error table..
In normalizer, we have to mention occurs option based on the max number of possible
errors.
In the Expression we will validate what kind of error and assign the code for each type
of error. based on the code we will lookup on the lkp file..
there is no specific reason to use unconnected lookup. u can use connected also.. we
required only one return port from the lookup so we used unconnected where we will
get the performance improvement also.. if ur using connnected lookup it will create
the cache for each coloumn in the lookup. so it might hit performance so we used
unconnected..
28) Hi All,
I am getting the error while installing infa 8.6 on my PC. OS is Vista and database
is Oracle 10G vista compatible .
Error Details :-
Informatica PowerCenter 8.6.0
Use the error below and catalina.out and node.log in the server/tomcat/logs
directory on the current machine to get more information. Select Retry to continue
the installation.
EXIT CODE: S
ANS:
(II) Thanks a Lot Aparna....As per your suggestions infa Server got successfully
Installed...
Found few errors with respect to configuration in Admin console and Client let
you know later.
29) I am new to informatica ... can any one explain me abt incremental
aggregation property... I knw we need to chk incremental agg. property in
sessions....
but my qtn is just by chking that option how informatica filters out only the new
records from the source? do we need to do some kind of look up in mappings? or
its enough if we just click this incremental aggregation option?
ANS:
(I) when u send data second time and you have new values to be aggregated
then you go for incremental aggregation. when u select incremental
aggregation in session properties the data is copied into cache and
checked in cache for the new values and aggregation is performed.
(II) Thanks Shashank ... u mean to say the source data as is copied into the
cache?
and it compares the new source with the cache and processes only the
new records for aggregation?
(III) Its is not source data which is going into cache it is target data and source
data is compared with this cache.
(IV) okey .. but the target is aggregated ... for example
How it compares?
(V) As per my knowledge,
The property(checkbox) specified in the sesssion tab would NOT be
responsible to filter out the new records and aggregate it..
The option is used only for the incremental aggregation or also called as
running aggregation.
The onus would be on the user to filter the delta records(changed/New) at
the source side and pass it on to the aggregator..
When you check the option in the session properties,infa creates two set of
caches for data and index..(original set of cache and backup set of caches
which is used for recovery purposes).
eg:
existing data in the emp table as on 20sep
-------------------------------------------------
empid allowances date
1 100 20sep
2 200 20sep
Note the aggregation of the allowances that has taken place above and
also the new record that been inserted into the table..
(VI) Ya thts correct.,explained by the shiva prasad..In realtime these
incremental aggregation will be used in telecom industry and insurance
domain..
* telecom domain in postpaid connection to find out the total balance what
have you talked till today for this month will be calculated in incremental
aggregation..
30) scenario
id name
1 abc
2def
3ghi
1 abc
2 def
2 def
3 ghi
----------
Target
-----------
id name
1 abc
1 abc1
2 def
2 def2
2 def3
3 ghi1
3 ghi2
how to acheive this logic
ANS:
(I) select id,Name || decode (row_number() over(partition by id order
by id),1,null,row_number() over(partition by id order by id)-1)
from Table
Use the above query in Source Qualifier and load the records into
your target.
(II) If this is a flat file how do solve this
(III) Source(Flat-File)--SQ-->Rank Trans(Group by ID and select no of
ranks say 10000)-->Exp (Drag all the fileds from Rank Trans--
id,name,rank)
Disable the Outputport in the exp for the columns (name,rank)
Create One O/P Port named as Named_rn give this formula name
|| (decode(rank,1,null,rank-1)
Now Connect the id & Names_rn from expresiion to Target.
--------------------------------------------------------------------------
31) please tell about ETL testing using informatica.
Roles of ETL testing experts.
At what stage ETL testing people will come in the project.
ANS:
(I) basically ETL testing is 4 types:1 unit test: for developed
mappings
test procedure:
test case:
expecting result:
actual result:
status: pass or fail
(II) Apart from mappings....the following test scenarios should
be considered
ANS:
(I) PETL_24049 failed to get the initialization
properties from the master service process for the
prepare phase [Session task
instance[Session_name] Unable to read variable
definition from parameter file[file_name]]
with error code [32522632]
ANS:
(I) Hi Guys,
ANS:
(I) Migration code from dev --> test
---> Prod is having different
methods. It will be different from
project to project.
ANS:
(I) In Informatica from 7.1.4 onwards provided separate repository for error tables.
In session properties you have option to log the errors in error tables. Informatica build 4
separate error tables to log the errors.
For error info., Infa 7.1 onwards provides 4 inbuitl error tables.
PMERR_SESS, PMERR_TRANS, PMERR_DATA, PMERR_MSG...........these tables log the
error details on row by row basis..
This errors are related to only informatica functions or validations or technical errors not
data related errors. Informatica will log Technical issues related to informatica process.
2n 3 columns in target2
How?
ANS:
(I) From source qualifier directly connect first 3 columns to first target and remaining 3
columns to second target.
Siva answer is correct u can directly connect the columns to different targets based on
requirement.
(III) i don't think so you need to go for sorter and ranker for this solution.
check the problem clearly it is simply loading from source to target..
first three coloumns needs to load to target 1 and next three columns need to load to
target 2..
With out using Update strategy transformation can we Update the target in informatica?
If it is possible how ?
ANS:
(I) By Using Session Properites.........
ANS: (I) in cognos u will find gosales database with lacks of records
(II) we have SH ( sales History ) and OE (Order Entry) schemas
are there in Oracle so those may helpful to your
requirement
plz answer me for dis question..i have faced d question in DELL interviews..
ANS:
(I) Hi...the only solution that i may think as of now is to split the file into two
based on the size and then load them into there respective targets.
1) Two DB users
2) oracle port number(default is 1521) , Oracle SID ( difault is ORCL)
And other part of the installations are
40) i have 2 work flows namely wkf1,wkf2. after the execution of wk1 we have to execute
wkf2. if wkf1 is not executed means you should not execute wkf2 we have to do it
automatically
ANS:
(I) use the command task in workflow1 as the final task and start the workflow2 using
pmcmd command....once all the tasks are executed successfully in workflow1, the
pmcmd command in command task will trigger the workflow2 automatically...
(II) Hi Prasad,
In this approach we can't achieve all the requirements. any time u can start the
wkf2 in ur case but requirement is wkf2 shouldn't run if wkf1 is not succeeded.
Solution for In this case u have to create a flat file at the end of wkf1 by using
touch xyz.txt and in wkf2 use even wait task and use file watch event on that flat
file. in this solution if you run the wkf2 it will wait for the flat file to create if not it
won't run the wkf2.
ANS:
(I) using sorter t/r Check the distinct option in property tab
While loading from flat file(fixed length) to oracle table .the data got loaded successfully but when i
checked in session log file .. i got the below error ..
Severity: ERROR
Timestamp: 9/16/2010 1:31:45 PM
Node: NODE_02
Thread: TRANSF_1_1_1
Process ID: 6749
Message Code: TT_11132
Message: Transformation [e_Donnelly] had an error evaluating output column [v_DATE2]. Error
message is [<<Expression Error>> [TO_DATE]: invalid string for converting to Date
... t:TO_DATE(s:LTRIM(s:RTRIM(s:' ',s:' '),s:' '),s:'YYYY-MM-DD')].
ANS:
(I) Hi Karthik,
This error will occur when using incompatible date formats. The
default date format in PowerCenter is MM/DD/YYYY HH24:MI:SS
and hence the SYSDATE is converted to MM/DD/YYYY
HH24:MI:SSbut in this case the TO_DATE is expecting the input in
the DD-MON-YYYY format.
Solution
To resolve this use the convert the date format of the SYSDATE to
the formate used by the TO_DATE function.
Example:
TO_DATE(TO_CHAR(SYSDATE,'DD-MON-YYYY'),'DD-MON-YYYY')
this data is getting loaded but some data has no value and it is
creating error for those data
ANS:
(I) You have to define the Join Condition in the Source
qualifier.
source1----source1_SQ
---------> Expression Trans
source2----source2_SQ
I want to send
How to do this?
ANS:
(I) Use Exp Tx compare the row and set the flag then
use router transformation.
Table data:
DEPTNO,DEPT_NAME
10,AAA,CHE
10,BBB,BGR
20,CCC,HYD
30,DDD,KMU
30,EEE,TPJ
-----------------------------------------------------
45) Please explan this query
Select * from emp e where &N=(select count(distinct(sal)) from emp f where f.sal>=e.sal)
Before executing this query you can give this command. "set autotrace on". Then you will
know how the SQL got executed
(II) Hi Aparna,
Thanks for ur reply but it showing following errors Could you help me
SQL>set autotrace on
SP2-0613:Unable to verify PLAN_TABLE format or existence
SP2-0611:Error enabling EXPLAIN report
SP2-0618:Cannot find the session Identifier. Check PLUSTRACE role is enabled
SP2-0611 Error enabling STATISTICS report
(IV) IM using oracle 9i You have to run these scripts before you set autotrace on.
$ORACLE_HOME/sqlplus/admin/plustrce.sql
$ORACLE_HOME/rdbms/admin/utlxplan.sql
ANS: Router is a Active tx because of Update strategy property and it will change the Rownum of the
table
---------------------------------------------------------------------------------------------------------------
47) Suppose I have one source which is linked into 3 targets.When the workflow runs for the first time
only the first target should be populated and the rest two(second and last) should not be
populated.When the workflow runs for the second time only the second target should be populated
and the rest two(first and last) should not be populated.When the workflow runs for the third time
only the third target should be populated and the rest two(first and second) should not be
populated.Could any one
ANS:
Table1 : MOD(SEQ_VALUE,3)=0
Table2 : MOD(SEQ_VALUE,3)=1
Table3 : MOD(SEQ_VALUE,3)=2
id
1
2
3
target2:
id
1
1
1
2
2
2
3
3
3
ANS:
(I) This question is similar to past post only
to do this
and i need o/p in such a way tht routing of unique values into one target and dummy
values into other with in a given mapping only
SNO,SNAME,EDUCATION
101,VIJAY,BTECH
102,PRAMODH,BCOM
104,SHANKAR,MSC
107,SNEHA,MCA
Din & Ramesh is correct. may be u got confused with Jai answer using aggregator. now
aggregator is not required. here i'm giving you the coding check it out.
So what I want to suggest is …. We can have a look up to the existing table … and can
give a condition …. If that found put into table 1 if not put into table 2.
(VII) Shankar wants to findout the duplicate , unique record into two different targets in the
same day.
u are thinking abt next day also.If you use the lookup on the target table first day ur
approach won't workout. first day all the records won't be there in the target table so all
the records go to one target. i can understand ur point but shankar requirement is not
that.
ANS: You are solution is not completed/Correct. may be u got confused with the question.
Check the similar question in the old postshttp://www.orkut.co.in/Main#CommMsgs?
cmm=791012&tid=2543660813794278811&kw=v_count
---------------------------------------------------------------------------------------------------------------
50) what is source commit and target commit intervals
where we can find rejected records and how to reload that records
ANS:
(I) The main differencee between lookup and joiner is
Lookup is performs Non-equijoins and joiner performs only EquiJoins.
Second Difference is Whenever we perform join Using Joiner It contains Must Primary key-
Foreign key Relation ship.
By using Lookup Transformation Just We Need a Matching Port.
Target-based commit. The PowerCenter Server commits data based on the number of target
rows and the key constraints on the target table. The commit point also depends on the buffer
block size the commit interval and the PowerCenter Server configuration for writer timeout
.
Source-based commit. The PowerCenter Server commits data based on the number of source
rows. The commit point is the commit interval you configure in the session properties.
User-defined commit. The PowerCenter Server commits data based on transactions defined in
the mapping properties. You can also configure some commit and rollback options in the
session properties. Rejected Records Are Saved in Session log in the form .bad File
(III) source commit is after how many records you wan load to target ...
to improve the session performence you have to increase the commit interval .......
51) In expression transformation no values are getting passed from output port...even if i
'hard code' any value in output port its not coming to next transformation...
ANS:
(I) if the value is coming into the expression then its not hard coded
hard coded means that the value remains the same for tht port
like suppose if we want a port STATE to have a value as 'New York' then v make that
port as output (checked) ,input(unchecked) and write 'New York' in the expression
part
51) Hi All..
Can anybody clarify me these questions.
1)Surrogate comes under which category?
2)Where we use surrogate key?(technically)
3)Whether first we should use and why?
i)Expression Transformation
ii)Filter Transformation
4)What is basic diffrence between star schema and snowflake schema?
5)How to do performance tuning in Informatica?
6)What is diffrence between data migration and data warehouse.
7)What is dynamic and static lookup transformation.
8)Why to choose oracle or sql server for data warehouse?
ANS:
(I) 1)Surrogate comes under which category?
2)Where we use surrogate key?(technically)
Surrogate Key is an artificial identifier for an entity.In surrogate key values are generated
by the system sequentially(Like Identity property in SQL Server and Sequence in Oracle).
They do not describe anything.
Primary Key is a natural identifier for an entity. In Primary keys all the values are entered
manually by the user which are uniquely identified. There will be no repeatition of data.
Need for surrogate key not Primary Key
If a column is made a primary key and later there needs a change in the datatype or the
length for that column then all the foreign keys that are dependent on that primary key
should be changed making the database Unstable
Surrogate Keys make the database more stable because it insulates the Primary and
foreign key relationships from changes in the data types and length.
But as per the requirement if necessary u can use Expression Transformation first, For
example.. you have to set some flag and based on those flag you want to filter records. So
in such situation you have to use Expression first.
(III) 5)How to do performance tuning in Informatica?
The most common performance bottleneck occurs when the PowerCenter Server
writes to a target database.
You can identify performance bottlenecks by the following methods:
♦ Running test sessions. You can configure a test session to read from a flat file source or
to write to a flat file target to identify source and target bottlenecks.
♦ Studying performance details. You can create a set of information called performance
details to identify session bottlenecks. Performance details provide information such as
buffer input and output efficiency.
♦ Monitoring system performance. You can use system monitoring tools to view percent
CPU usage, I/O waits, and paging to identify system bottlenecks.
Once you determine the location of a performance bottleneck, you can eliminate
the Bottleneck by following these guidelines:
♦ Eliminate source and target database bottlenecks. Have the database administrator
optimize database performance by optimizing the query, increasing the database network
packet size, or configuring index and key constraints.
♦ Eliminate mapping bottlenecks. Fine tune the pipeline logic and transformation settings
and options in mappings to eliminate mapping bottlenecks.
♦ Eliminate session bottlenecks. You can optimize the session strategy and use
performance details to help tune session configuration.
♦ Eliminate system bottlenecks. Have the system administrator analyze information from
system monitoring tools and improve CPU and network performance.
(IV) 4)What is basic diffrence between star schema and snowflake schema?
Snowflake Schema : A snowflake schema is a term that describes a star schema
structure normalized through the use of outrigger tables. i.e dimension table hierachies
are broken into simpler tables.
In a star schema every dimension will have a primary key.
• In a star schema, a dimension table will not have any parent table while in a snow flake
schema, a dimension table will have one or more parent tables.
• Hierarchies for the dimensions are stored in the dimensional table itself in star schema
Whereas hierachies are broken into separate tables in snow flake schema. These
hierachies helps to drill down the data from topmost hierachies to the lowermost
hierarchies.
• Star schema utilises less joins than snow flakes so the performance is faster.
• last but not least Star schema is more common than snow flakes
(V) 6)What is diffrence between data migration and data warehouse.
Data Migration: It is a process of migration of data from one database location either
relational or non relational to another database location.
Uncache
Static Cache
When the condition is true, the PowerCenter Server returns a
value from the lookup table or cache.
When the condition is not true, the PowerCenter Server returns the
default value for connected transformations and NULL for unconnected transformations.
Dyanamic Cache
When the condition is true, the PowerCenter Server either updates rows in the cache or
leaves the cache unchanged, depending on the row type.
This indicates that the row is in the cache and target table. You can pass updated rows to
the target table.
When the condition is not true, the PowerCenter Server either inserts rows into the cache
or leaves the cache unchanged, depending on the row type.
This indicates that the row is not in the cache or target table. You can pass inserted rows
to the target table.
(VI) 8)Why to choose oracle or sql server for data warehouse?
This is a debatable topic. there can be many reasons like technical, financial, user
requirement etc. One should give the reason by analysing all the logics. I have worked in
both SQL and ORACLE, both are pros and cons..one person can not say easily that ORACLe
is better than SQL SErver.
----------------------------------------------------------------------------------------
52) ) i hav 1table containing 1000records i want to load first five records in one target
and next 5 records in second target alternatley up2 1000recordss ???????????
2)i hav 1 table contaning 10records and dat table contaning sum duplicates
now i want to load duplicates and original records in two targets???????
3) how to load the same table 10 times in2 target in one mapping????
ANS:
(I) Q1 -
4. At Workflow level also you should be able to use a workflow variable and a decision task
I guess.
5. You can also schedule your workflow to run for 10 times one after another.
ANS:
(I) I guess SQL transformation can come to your rescue here.
In the filter check the condition count1 > count2. After that you can use SQL
transformation and write your SELECT query there.
54) hi,
in a.txt
-------------
C1,C2,C3
1,2,3
in b.txt
-----------
C1,C2,C3
2,3,4
like this up to n.txt
i want to insert my data into a target as
C1 C2 C3 C4
1 2 3 a.txt(source file name)
2 3 4 b.txt( " )
...................
..................
briefly
i want to load the data into target is
all the data from source(a.txt,b.txt ......n.txt) as well as source file name
and
if i want to add another n+1 file in source system that will add in the target
ANS:
(I) we have option in source flat file " Add currently processed file name as port" you check the
option then u will be the achieve the ur desired output.
(II) hi,
first read ur input files using indirect option in session.then in flat file source
properties "add currently procesed files" port u have achive tje desired output.for indirect
option check with informatica help
ANS:
(I) Push and Pull strategies determine how data comes from source system to ETL
server.
Push : In this case the Source system pushes data i.e.(sends data) to the ETL server.
Pull : In this case the ETL server pulls data i.e.(gets data) from the source system.
(II) Fact table without any measure is called factless fact table.
(III) A "junk" dimension is a collection of random transactional codes, flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes. A
good example would be a trade fact in a company that brokers equity trades
Hi Guys can some one help me on what condition we prefer the pre and post SQL Overrides
And whats its advantages and dis advantages
ANS:
(I) In case if you want to do anything on the table before data loading or after data loading we
use pre and post sql's
what is the main difference between source filter which we will give in the source qualifier and
filter condition which we will add by overriding the source qualifier query?
ANS:
(I)
Filter
Hi,
If you set the filter at SQ level, it limits the source data( means it filter the data at source lever so
it increase the performance) if you use the filter condition with the filter tranceformation it limits
the target rows( means it retrive all rows from the source then it will apply the filter condition
and load the data in to the target) it will take some time to retrive the records from source, so it
will increase the session time.
(II) Paluri was not talking about diff between source filter in source qualifier transformation and
filer in filter transformation..
His Question was Source qualifier filter and SQL over ride in source with filter condition..
Both are the. In source filter u can only give filter condition to filter out unwanted records or
for testing purpose..
where as SQL over ride source qualifer is to customize the default sql which is generated by
Source Qualifier..
(III) is there any difference in the way how they will execute? i mean both the filter conditions
we execute in the database level?
(IV) Purpose of both is the same but executes in diff levels...
SQL override execute in Database level, where as just filter condition executes in Informatica
level, It means source Qualifier reads all the data from Database and then based on filter
condition it passes the records
(V) if you observe the session log , select query which is issued to the database will be with the
"where clause".(where clause will be the condition which we added in the source filter of
the sql transformation)
As per my understanding source fileter condition will be add to the select statement first
and it will be issued to the database...
(VI) for your clarity, try to run a mapping with SQL over ride with where clause and don't put and
any filter... and again run the same with just filter condition ( no sql over ride) then u will
understand the diff
(VII) If we run with just filter condition ( no sql over ride) then observe the log for the sql queiry
which is issued on database,it will be adding filter condition only.
if we over ride the query then it will take the over rided query.
(VIII) if we don't give any filter condition or sql over ride, though the log shows default sql query
without any customizations
Hi Friends,
ANS:
(I) your question and wat you want to achieve are totally different...
Hi guys can any body explain why Union is Active transformation? I tried Union but it doesn't
delete any duplication records.
ANS:
FIRST WE know a transformation is said to be active if it changes the rows to pass through it. it
combines the two tables jof similar stru. so there is a change in records so it is active.
suppose u want to delete duplicate records we use only option . otherwise we use union all for
displaying with duplicate records.
3) Scenario: I have 3 tables 1st table Emp ID and 2nd table Telephone Number, Address ,
Location and 3rd table Bank Name , Account num .I want all the 3 tables in one target table
(all columns converted to rows .Note table is in Denormalized form) ** Empid is Common in
3 tables
Other part of Question
Scenario: Can i solve the above problem with Unconnected Lookup.
Other part
Scenario: If we are using Joiner what is the join condition
4) Scenario : I want to update the target table only, without using Dynamic Cache
5) Scenario : I have used Sql override in lookup ,there are 5 ports from Col A to
Col D But I have used override for first 3 columns Col A, Col B, Col C order by Col B and
mapping is validated .when i run the mapping the session throws error and Sql overide is
not valid.
6) Scenario : I have to join table is it better using the Sql override or lookup or
Joiner. Performance wise which is better
8) Scenario: In workflow I have a reusable Session, the same session is reused across other
workflow. Any change made in either of the Session does reflect in the other Sessions.
9) Can we make any changes in reusable & Non reusable Session often
10) Scenario : In my mapping the Update Strategy is not Updating
ANSWERS :::
1> Direct : When we want to load data from 1 flat file only.
Indirect: when we want to load data from 2 or more flat files of same data structure.
2> Static Cache, Dynamic Cache, Persistent Cache, Recache Cache, Shared Cache
4> Use a filter after dynamic cache and give filter condition NewLookUprow=0
5> After SQL override give "--" without quotes else override wont work.
6> If tables are from same database use Source Qualifier else joiner. Lookup is not a good
option.
8> When you make changes to a reusable task definition in the Task Developer, the
changes reflect in the instance of the task in the workflow only if you have not edited the
instance.
10> Check whether you have set the option Treat Source Row as Update in session
properties or not. Set it to Update.
3> We need to join the three tables using Joiner. We need 2 joiner transformations to join
them. Join condition would be empid of one table=empid of another table.
Other option is to set the Treat Source Row as Update in session properties.
2) In source we have 1000 rows and i have 3 tragets . The 1st 100 rows have to go in 1st
target and the next 200 rows in 2nd traget and the rest of the rows in the 3rd tgt.
3) I have some duplicate rows in the Source table , i have 2 targtes the unique records has
to be loaded in the 1st target and the duplicate records in 2nd target.
4) I have Empno,Name ,Loc in the source table , we have two targets the 1st target is Tgt_
india and Tgt_USA. when the employee moves from india to USA the row of that employee
must be inserted in USA tgt and must be deleted in Indian tgt and vice versa.
SOLUTIONS
1> Skip the first row to delete header. Not sure for footer.
2> Use an expression t/f after source and use a mapping variable of count
aggreation type. Then use a router to filter the records in 3 groups and load to 3
target tables. ( REFER TO FLATFILES FOLDER MAPPING)
4> Insert table using a router to appropriate table and use Post SUCCESS COMMAND
to delete the row from other table. (REFER TO FLATFILES MAPPING)
SQL STAEMENT ::
1> delete from emp_india a
where empno =(select empno from emp_usa b
where a= b);
Aim of Informatica ::
Informatica is a ETL tool used to extract data from OLTP sources , applied transforamtion to
cleanse the data before applying the business logic and then loaded into the star tables
(warehouse)
ARCHITECTURE
Data Warehouse Architecture
Informatica Architecture
SCHEMA
1Star Schema
STAR SCHEMA
Star schema architecture is the simplest data warehouse design. The
main feature of a star schema is a table at the center, called the fact
table and the dimension tables which allow browsing of specific
categories, summarizing, drill-downs and specifying criteria.
Typically, most of the fact tables in a star schema are in database third
normal form, while dimensional tables are de-normalized (second
normal form).
Fact table
Typical fact tables in a global enterprise data warehouse are (apart for
those, there may be some company or business specific fact tables):
Dimension table
SNOWFLAKE SCHEMA
Snowflake schema architecture is a more complex variation of a star schema design. The main
difference is that dimensional tables in a snowflake schema are normalized, so they have a
typical relational database design.
Snowflake schemas are generally used when a dimensional table becomes very big and when a
star schema can’t represent the complexity of a data structure. For example if a PRODUCT
dimension table contains millions of rows, the use of snowflake schemas should significantly
improve performance by moving out some data to other table (with BRANDS for instance).
The problem is that the more normalized the dimension table is, the more complicated SQL joins
must be issued to query them. This is because in order for a query to be answered, many tables
need to be joined and aggregates generated.
GALAXY SCHEMA
For each star schema or snowflake schema it is possible to construct a fact constellation schema.
This schema is more complex than star or snowflake architecture, which is because it contains
multiple fact tables. This allows dimension tables to be shared amongst many fact tables.
That solution is very flexible, however it may be hard to manage and support.
The main disadvantage of the fact constellation schema is a more complicated design because
many variants of aggregation must be considered.
In a fact constellation schema, different fact tables are explicitly assigned to the dimensions,
which are for given facts relevant. This may be useful in cases when some facts are associated
with a given dimension level and other facts with a deeper dimension level.
Use of that model should be reasonable when for example, there is a sales fact table (with details
down to the exact date and invoice header id) and a fact table with sales forecast which is
calculated based on month, client id and product id.
In that case using two different fact tables on a different level of grouping is realized through a
fact constellation model.
Normalization
What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of
the normalizaton process::
First Normal form(1 NF) sets the very basic rules for an organized database.
Second Normal form(2 NF) further addresses the concept of removing duplicative data.
Third Normal form(3 NF) remove columns which are not dependent upon the primary key.
Next
Install
Go to Program files
Informatica power center 7.1.1 > Repository Server > Repository Setup.
You see the prompt screen of Repository server with differnt options.
Set the "Server Port Number" any value between 5002 to 65535.
Set the "administrative password" of your own choice.It is case
sensitive.
Now we successfully configured Repository Server.
Start Services
Login to your
Create two users one for Repository and other for Target Database.
This are necessary to secure the data in the repositories.
For eg., Repository (Username :: Rep_one , Password :: r) and Target
(username::trg_wh , Password :: t).
After creating the users.Test it by connecting to the database.
Previous
Go to All Programs
Informatica Power Center 7.1.1 >Informatica Power center -Server >
Informatica Server Setup
It prompts with a box ..click on Continue button.
Now you find a bigger prompt with set of tabs with different options.
In the first tab "Server" tab
Server Name :: Temporary server name
TCP/IP Host Name :: Computer Name
Click on "Repository" tab
Repository Name:: Name of the repository created at console place
Repository User :: Username of Repository
Repository Password :: Password of Repository
Repository Server host Name :: Computer Name
Now move to "License" tab
Enter Option key , click on update
Enter Connectivity , click on update
Click on ok
REPOSITORY MANAGER
Actions
DESIGNER MENU
DESIGNER OVERVIEW
Design Mappings
Represent how to move data from source to target.
Design Mapplets
Create Reusable and Non-Reusable Transformation
Access multiple Repositories and folders templates/tables at a time.
Many more features like Data Profiling , Propagate , Debugger ,Versioning etc.,
DESIGNER MENU
Important Designer Tools
Source Analyzer
Create Source Definition - Import data from Flat Files/ Relational/ Application/ XML/ COBOL
Warehouse Designer
Transformation Developer
Mapplet Designer
Mapping Designer
Create Mappings - Represents how the data to move from Source to Target Table.It consists of
Source Definition , Mapplets , Transformations and Target Definition.
MAPPLETS
If in a need of particular set of transformations which uses same logic in multiple mappings.So
that you can reuse the group of transformation in multiple mappings.
Create Mapplet
1. You can create Mapplets in Mapplet Designer Tool.
2. Mapplet Input Transformation is used only when you dont want to use the Source
Definition in Mapplet Designer.
3. Mapplet Output Transformation is always used when ever you create Mapplet.
4. Example Mapplet Flows
Source > Sorter > Expression > Mapplet Output
Mapplet Input > Sorter > Expression > Mapplet Output
Advantages
Include source definitions. You can use multiple source definitions and source qualifiers
to provide source data for a mapping.
Accept data from sources in a mapping. If you want the mapplet to receive data from the
mapping, you can use an Input transformation to receive source data.
Include multiple transformations. A mapplet can contain as many transformations as you
need.
Pass data to multiple transformations. You can create a mapplet to feed data to multiple
transformations. Each Output transformation in a mapplet represents one output group in
a mapplet.
Contain unused ports. You do not have to connect all mapplet input and output ports in a
mapping.
Limitations
VERSIONING
Slowly Changed
Type Description
Dimension Mapping
SCD Type 1 Slowly Changing Inserts new dimensions. Overwrites existing
Dimension dimensions with changed dimensions. (Shows
Current Data)
SCD Type 2 /Version Slowly Changing Inserts new and changed dimensions. Creates
Data Dimension a version number and increments the primary
key to track changes.
SCD Type 2 /Flag Current Slowly Changing Inserts new and changed dimensions. Flags
Dimension the current version and increments the
primary key to track changes.
SCD Type 2 /Date Range Slowly Changing Inserts new and changed dimensions. Creates
Dimension an effective date range to track changes.
SCD Type 3 Slowly Changing Inserts new dimensions. Updates changed
Dimension values in existing dimensions. Optionally
uses the load date to track changes.
Data Profiling
Data profiling is a technique used to analyze source data. PowerCenter Data Profiling can help
you evaluate source data and detect patterns and exceptions. PowerCenter lets you profile source
data to suggest candidate keys, detect data patterns, evaluate join criteria, and determine
information, such as implicit datatype.
You can use Data Profiling to analyze source data in the following situations:
VERSIONING
Advantage of Versioning ?
A repository enabled for version control maintains an audit trail of version history. It stores
multiple versions of an object as you check out, modify, and check it in. As the number of
versions of an object grows, you may want to view the object version history. You may want to
do this for the following reasons:
Determine what versions are obsolete and no longer necessary to store in the repository.
Troubleshoot changes in functionality between different versions of metadata.
WORKFLOW MANAGER
Actions
Create Reusable tasks , Worklets , Workflows.
Schedule Workflows.
Configure tasks.
Workflow
A workflow is a set of instructions that describes how and when to run tasks related to extracting,
transforming, and loading data.
Worklets
A worklet is an object that represents a set of tasks.
TASKS
There are many tasks available , which are used to create workflows and worklets.
Types of Tasks
WORKFLOW MONITOR
You can monitor workflows and tasks in the Workflow Monitor. View details about a workflow
or task in Gantt Chart view or Task view.
Actions
You can run, stop, abort, and resume workflows from the Workflow Monitor.
You can view the log file and Performance Data
TRANSFORMATIONS
Transformation Active Passive Description
__ sorting the tables in ascending or descending and aslo to
SORTER obtain Distinct records.
you can use mapping parameters and variables to make mappings more flexible.
You can Reuse a mapping by varing the parameters and Variables.
Represntation
$$parametername/$$variablename
Parameters A mapping parameter represents a constant value that you can define before running
a session. A mapping parameter retains the same value throughout the entire session.
Variables
A mapping variable represents a value that can change through the session. The PowerCenter
Server saves the value of a mapping variable to the repository at the end of each successful
session run and uses that value the next time you run the session.
DEBUGGER
Actions
You can debug a valid mapping to gain troubleshooting information about data and error
conditions.
o Before you run a session. After you save a mapping, you can run some initial
tests with a debug session before you create and configure a session in the Workflow
Manager.
o After you run a session. If a session fails or if you receive unexpected results in
your target, you can run the Debugger against the session.
2. What is junk dimension? What is the difference between junk dimension and
degenerated dimension?
A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that
are unrelated to any particular dimension. The junk dimension is simply a structure that provides
a convenient place to store the junk attributes.where as A degenerate dimension is data that is
dimensional in nature but stored in a fact table.
Junk dimension: the column which we are using rarely or not used, these columns are formed a
dimension is called junk dimension
Degenerative dimension: the column which we use in dimension are degenerative dimension
But We are talking only the column empno, ename from the EMP table and forming a dimension
this is called degenerative dimension
A fact table consists of measurements of business requirements and foreign keys of dimensions
tables as per business rules.
A fact table consists of measurements of business requirements and foreign keys of dimensions
tables as per business rules.
There can just be SKs within a Star schema, which itself is de-Normalized. Now, if there were
then FKs on the dimensions as well, I would agree. Being in normal form, more granularity is
achieved with less coding i.e. less number of joins while retrieving the fact.
Basic difference is E-R modeling will have logical and physical model. Dimensional model will
have only physical model. E-R modeling is used for normalizing the OLTP database design.
Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. Adding to the
point:
E-R modeling revolves around the Entities and their relationships to capture the overall process
of the system.
In ER modeling the data is in normalized form. So more number of Joins, which may adversely
affect the system performance. Whereas in Dimensional Modeling the data is denormalized, so
less number of joins, by which system performance will improve.
Conformed dimensions are the dimensions, which can be used across multiple Data Marts in
combination with multiple facts tables accordingly
Conformed facts are allowed to have the same name in separate tables and can be combined and
compared mathematically. Conformed dimensions are those tables that have a fixed structure.
There will b no need to change the metadata of these tables and they can go along with any
number of facts in that application
Dimension table, which is used, by more than one fact table is known as a conformed dimension.
1. Ralph Kimbell Model (Top - Down approach :: Data Warehouse --> Data Mart)
Kimball model always structured as Denormalized structure.
Data validation is to make sure that the loaded data is accurate and meets the business
requirements. Strategies are different methods followed to meet the validation requirements.
Surrogate key is the primary key for the Dimensional table. Surrogate key is a substitution for
the natural primary key.
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the
dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or
SQL Server Identity values for the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can
change and this makes updates more difficult and also used in SCDs to preserve historical data.
10. What is meant by metadata in context of a Data warehouse and how it is important?
Metadata or Meta data is data about data. Examples of metadata include data element
descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions,
and process/method descriptions. The repository environment encompasses all corporate
metadata resources: database catalogs, data dictionaries, and navigation services. Metadata
includes things like the name, length, valid values, and description of a data element. Metadata is
stored in a data dictionary and repository. It insulates the data warehouse from changes in the
schema of operational systems. Metadata Synchronization The process of consolidating, relating
and synchronizing data elements with the same or similar meaning from different systems.
Metadata synchronization joins these differing elements together in the data warehouse to allow for
easier access.
In context of a Data warehouse metadata is meant the information about the data. This
information is stored in the designer repository. Meta data is the data about data; Business
Analyst or data modeler usually capture information about data - the source (where and how the
data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and
behavior of data (how it is modified / derived and the life cycle) in data dictionary a.k.a
metadata. Metadata is also presented at the Data mart level, subsets, fact and dimensions, ODS
etc. For a DW user, metadata provides vital information for analysis / DSS.
12. What is the main difference between schema in RDBMS and schemas in Data
Warehouse?
RDBMS Schema DWH Schema
* Used for OLTP systems * Used for OLAP systems
* Traditional and old schema * New generation schema
* Normalized * De Normalized
* Difficult to understand and navigate * Easy to understand and navigate
* Cannot solve extract and complex * Extract and complex problems can be easily
problems solved
* Poorly modelled * Very good model
In Dimensional Modeling, Data is stored in two kinds of tables: Fact Tables and Dimension
tables.
Fact Table contains fact data e.g. sales, revenue, profit etc.....
Dimension table contains dimensional data such as Product Id, product name, product
description etc.....
Dimensional Modeling is a design concept used by many data warehouse designers to build their
data warehouse. In this design model all the data is stored in two types of tables - Facts table and
Dimension table. Fact table contains the facts/measurements of the business and the dimension
table contains the context of measurements i.e., the dimensions on which the facts are calculated.
The data model is also detailed enough to be used by the database developers to use as a
"blueprint" for building the physical database. The information contained in the data model will
be used to define the relational tables, primary and foreign keys, stored procedures, and triggers.
A poorly designed database will require more time in the long-term. Without careful planning
you may create a database that omits data required to create critical reports, produces results that
are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.
It describes the amount of space required for a database. Level of Granularity indicates the extent
of aggregation that will be permitted to take place on the fact data. More Granularity implies
more aggregation potential and vice-versa. In simple terms, level of granularity defines the
extent of detail. As an example, let us look at geographical level of granularity. We may analyze
data at the levels of COUNTRY, REGION, TERRITORY, CITY and STREET. In this case, we
say the highest level of granularity is STREET. Level of granularity means the upper/lower level
of hierarchy, up to which we can see/drill the data in the fact table. Level of granularity means
the upper/lower level of hierarchy, up to which we can see/drill the data in the fact table.
In Data warehouse we manually load the time dimension, Every Data warehouse maintains a time
dimension. It would be at the most granular level at which the business runs at (ex: week day,
day of the month and so on). Depending on the data loads, these time dimensions are updated.
Weekly process gets updated every week and monthly process, every month.
18. Difference between Snowflake and Star Schema. What are situations where Snow flake
Schema is better than Star Schema to use and when the opposite is true?
Star schema and snowflake both serve the purpose of dimensional modeling when it comes to
data warehouses.
Star schema is a dimensional model with a fact table (large) and a set of dimension tables
(small). The whole set-up is totally denormalized.
However in cases where the dimension tables are split to many tables that are where the schema
is slightly inclined towards normalization (reduce redundancy and dependency) there comes the
snowflake schema.
The nature/purpose of the data that is to be feed to the model is the key to your question as to
which is better.
Star schema
contains the dimension tables mapped around one or more fact tables.
It is a denormalized model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
Some times we used to provide separate dimensions from existing dimensions that time we will
go to snowflake
Disadvantage Of snowflake: Query performance is very low because more joiners is there
Conformed dimensions are the dimensions, which can be used across multiple Data Marts in
combination with multiple facts tables accordingly
Conformed facts are allowed to have the same name in separate tables and can be combined and
compared mathematically. Conformed dimensions are those tables that have a fixed structure.
There will b no need to change the metadata of these tables and they can go along with any
number of facts in that application without any changes
Dimension table, which is used, by more than one fact table is known as a conformed dimension.
They are dimension tables in a star schema data mart that adhere to a common structure, and
therefore allow queries to be executed across star schemas. For example, the Calendar dimension
is commonly needed in most data marts. By making this Calendar dimension adhere to a single
structure, regardless of what data mart it is used in your organization, you can query by date/time
from one data mart to another to another.
Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains
facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions
here D1,D2 are the Conformed Dimensions
A table in a data warehouse whose entries describe data in a fact table. Dimension tables contain
the data from which dimensions are created. A fact table in data ware house is it describes the
transaction data. It contains characteristics and key figures.
24. What are Semi-additive and faceless facts and in which scenario will you use such kinds
of fact tables
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others. For example:
Current Balance and Profit Margin are the facts. Current Balance is a semi-additive fact, as it
makes sense to add them up for all accounts (what's the total current balance for all accounts in
the bank?), but it does not make sense to add them up through time (adding up all current
balances for a given account for each day of the month does not give us any useful information
Conventional Load: Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading) All the Constraints will be disabled. Data will be loaded directly.Later the
data will be checked against the table constraints and the bad data won't be indexed. Conventional and
Direct load method are applicable for only oracle. The naming convension is not general one applicable
to other RDBMS like DB2 or SQL server..
Aggregate tables contain redundant data that is summarized from other data in the warehouse.
These are the tables which contain aggregated / summarized data. E.g Yearly, monthly sales
information. These tables will be used to reduce the query execution time.
Aggregate table contains the summary of existing warehouse data which is grouped to certain
levels of dimensions.Retrieving the required data from the actual table, which have millions of
records will take more time and also affects the server performance.To avoid this we can
aggregate the table to certain required level and can use it.This tables reduces the load in the
database server and increases the performance of the query and can retrieve the result very fastly.
A dimensional table is a collection of hierarchies and categories along which the user
can drill down and drill up. it contains only the textual attributes.
Previous Next
Send ::If you have any Data Warehouse related interview questions of a particular company.Please
share with us Click Here.Mention the name of the company ,Questions related to Informatica/ business
Objects /Database etc
25. Why are OLTP database designs not generally a good idea for a Data Warehouse
OLTP cannot store historical information about the organization. It is used for storing the details
of daily transactions while a datawarehouse is a huge storage of historical information obtained
from different datamarts for making intelligent decisions about the organization.
26. What is the need of surrogate key; why primary key not used as surrogate key
Surrogate Key is an artificial identifier for an entity.In surrogate key values are generated by the
system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not
describe anything.
Primary Key is a natural identifier for an entity. In Primary keys all the values are entered
manually by the user which are uniquely identified. There will be no repeatition of data.
If a column is made a primary key and later there needs a change in the datatype or the length for
that column then all the foreign keys that are dependent on that primary key should be changed
making the database Unstable
Surrogate Keys make the database more stable because it insulates the Primary and foreign key
relationships from changes in the data types and length.
For Example : You are extracting Customer Information from OLTP Source and after ETL
process, loading customer information in a dimension table (DW). If you take SCD Type 1, Yes
you can use Primary Key of Source CustomerID as Primary Key in Dimension Table. But if you
would like to preserve history of customer in Dimension table i.e. Type 2. Then you need
another unique no apart from CustomerID. There you have to use Surrogate Key.
Another reason : If you have AlphaNumeric as a CustomerID. Then you have to use surrogate
key in Dimension Table. It is advisable to have system generated small integer number as a
surrogate key in the dimension table. so that indexing and retrieval is much faster.
Data Cleansing: the act of detecting and removing and/or correcting a database's dirty data (i.e.,
data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly)
It can be done by using the exisitng ETL tools or using third party tools like Trivillium etc.,
Data Mart is a segment of a data warehouse that can provide data for reporting and analysis on a
section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts
are sometimes complete individual data warehouses which are usually smaller than the corporate
data warehouse.
Data Mart: a data mart is a small data warehouse. In general, a data warehouse is divided into
small units according the busness requirements. for example, if we take a Data Warehouse of an
organization, then it may be divided into the following individual Data Marts. Data Marts are
used to improve the performance during the retrieval of data.
eg: Data Mart of Sales, Data Mart of Finance, Data Mart of Maketing, Data Mart of HR etc.
But we can refer these degenerated dimensions as a primary key of the fact table
The data items thar are not facts and data items that do not fit into the existing dimensions are
termed as Degenerate Dimensions.
Degenerate Dimensions are used when fact tables represent transactional data.
32. Give examples of degenerated dimensions
Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS
Transaction Number
• Identifying Sources
• Identifying Facts
• Defining Dimensions
• Define Attribues
2.Once the business requirements are clear then Identify the Grains(Levels).
3.Grains are defined; design the Dimensional tables with the Lower level Grains.
4.Once the Dimensions are designed, design the Fact table With the Key Performance Indicators
(Facts).
5.Once the dimensions and Fact tables are designed define the relation ship between the tables by
using primary key and Foreign Key. In logical phase data base design looks like Star Schema
design so it is named as Star Schema Design
In this architecture end users access data that is derived from several sources through the data
warehouse.
Whenever the data that is derived from sources need to be cleaned and processed before putting
it into warehouse then staging area is used.
Customization of warehouse architecture for different groups in the organization then data marts
are added and used.
Architecture: Source --> Staging Area --> Warehouse --> Data Marts --> End Users
Q5>How to find no.of rows commited to the target, when a session fails.
Ans :: Log file
Q6>How to remove the duplicate records from flat file (other than using sorter trans. and
mapping variables)
Ans :: (i)Dynamic Lookup (ii) sorter and aggregator
Q7>How to generate sequence of values in which target has more than 2billion records.(with
sequence generator we can generate upto 2 billion values only)
Ans :: Create a Stored Procedure in database level and call it using storedprocedure
transformation.
Q8>I have to generate a target field in Informatica which doesn exist in the source table. It is the
batch number. There are 1000 rows altogether. The first 100 rows should have the same batch
number 100 and the next 100 would have the batch numbe 101 and so on. How can we do using
informatica?
Ans :: develop a mapping flow
Source > sorter > sequencegenerator (generate numbers)> expression (batchnumber , decode
function) > target
Q9>Lets take that we have a Flat File in the Source System and It was in the correct path then
when we ran the workflow and we got the error as "File Not Found", what might be the reson?
Ans :: Not entered “source file name” properly at the session level
target Definition ::
Store_id, Item, Qty, Price
101, battery, 3, 2.99
101, battery, 1 , 3.19
101, battery, 2, 2.59
Source Definition::
101, battery, 3, 2.99
101, battery, 1 , 3.19
101, battery, 2, 2.59
101, battery, 2,17.34
Ans :: Source > aggregator (group by on store_id , item , qty ) > target
Tip :: aggregator will sort the data in descending order if u dnt use sorted input.
Q12> in the source qualifer if the default query is not generated... what is the reason...? how to
slove it?
Ans :: (i)if source is flat file you cannot use this feature in source qualifier
(ii)In case if u are using the realational file as source and if u forget make the connection to the
next transformation from source qulaifier .u cannot gerate SQL query
ABBREVIATIONS
BI Business Intelligence
BO Business Object
C/S Client/Server
DM Data Modeling
DW Data Warehouse
ERD Enterprise Relationship Diagram
GB Giga Bytes
MB Mega Bytes
OS Operating System
QA Quality Assurance
TB Tera Bytes
ANS;
Manages the session and batch scheduling: Whe you start the informatica server the load maneger
launches and queries the repository for a list of sessions configured to run
on the informatica server.When you configure the session the loadmanager maintains list of list of
sessions and session start times.When you sart a session loadmanger fetches the session information
from the repository to perform the validations and verifications prior to starting DTM process.
Locking and reading the session: When the informatica server starts a session lodamaager locks the
session from the repository.Locking prevents you starting the session again and again.
Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file
and verifies that the session level parematers are declared in the file
Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user
have privelleges to run the session.
Creating log files: Loadmanger creates logfile contains the status of session.
ANS:
After the loadmanger performs validations for session,it creates the DTM process.DTM is to create and
manage the threads that carry out the session tasks.I creates the
master thread.Master thread creates and manges all the other threads.
DTM means data transformation manager.in informatica this is main back ground process.it run
after complition of load manager.in this process informatica server search source and tgt
connection in repository if it correct then informatica server fetch the data from source and load
it to target.
ANS:
ANS:
Datamovement modes determines how informatcia server handles the charector data.U choose the
datamovement in the informatica server configuration settings.
Two types of datamovement modes avialable in informatica:-
ASCII mode
Uni code mode.
What are the out put files that the informatica server creates during the session running?
ANS:
Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default
name: pm.server.log). It also creates an error log for error
messages.
These files will be created in informatica home directory:-
Session log file: Informatica server creates session log file for each session.It writes information about
session into log files such as initialization process,creation of sql
commands for reader and writer threads,errors encountered and load summary.The amount of detail in
session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each targets in mapping.Session detail include
information such as table name,number of rows written or rejected.U
can view this file by double clicking on the session in monitor window
Performance detail file: This file contains information known as session performance details which helps
you where performance can be improved.To genarate this file select
the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does notwrite to targets.
Control file: Informatica server creates control file and a target file when you run a session that uses the
external loader.The control file contains the information about the
target flat file such as data format and loading instructios for the external loader.
Post session email: Post session email allows you to automatically communicate information about a
session run to designated recipents.You can create two different
messages.One if the session completed sucessfully the other if the session fails.
Indicator file: If you use the flat file as a target,You can configure the informatica server to create
indicator file.For each target row,the indicator file contains a number to indicate
whether the row was marked for insert,update,delete or reject.
output file: If session writes to a target file,the informatica server creates the target file based on file
prpoerties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files.
For the following circumstances informatica server creates index and datacache files:-
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation
ANS:
What is polling?
ANS:
It displays the updated information about the session in the monitor window. The monitor
window displays the status of each session when you poll the informatica server.
-----------------------------------------------------------------------------------------------------------------------------------------
ANS:
ANS: NO.
ANS:
Session parameters are like maping parameters,represent values you might want to change between
sessions such as database connections or source files.
Server manager also allows you to create userdefined session parameters.Following are user defined
session parameters:-
Database connections
Source file names: use this parameter when you want to change the name or location of
session source file between session runs.
Target file name : Use this parameter when you want to change the name or location of
session target file between session runs.
Reject file name : Use this parameter when you want to change the name or location of
session reject files between session runs.
ANS:
Parameter file is to define the values for parameters and variables used in a session.A parameter
file is a file created by text editor such as word pad or notepad.
You can define the following values in parameter file:-
Maping parameters
Maping variables
session parameters.
ANS:The goal of performance tuning is optimize session performance so sessions run during the
available load window for the Informatica Server.Increase the session
performance by following.
The performance of the Informatica Server is related to network connections. Data generally
moves across a network at less than 1 MB per second, whereas a local disk
moves data five to twenty times faster. Thus network connections ofteny affect on session
performance.So aviod netwrok connections.
Flat files: If u?r flat files stored on a machine other than the informatca server, move those files
to the machine that consists of informatica server.
Relational datasources: Minimize the connections to sources ,targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force informatica server to perform multiple datapasses.
Removing of staging areas may improve session performance.
You can run the multiple informatica servers againist the same repository.Distibuting the session
load to multiple informatica servers may improve session performance.
Run the informatica server in ASCII datamovement mode improves the session
performance.Because ASCII datamovement mode stores a character value in one
byte.Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may
improve performance. Also, single table select statements with an ORDER BY or
GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size,which allows
data to cross the network at one time.To do this go to server manger ,choose server configure
database connections.
If u are target consists key constraints and indexes u slow the loading of data.To improve the
session performance in this case drop constraints and indexes before u run the
session and rebuild them after completion of session.
Running a parallel sessions by using concurrent batches will also reduce the time of loading the
data.So concurent batches may also increase the session performance.
Partittionig the session improves the session performance by creating multiple connections to
sources and targets and loads data in paralel pipe lines.
In some cases if a session contains a aggregator transformation ,You can use incremental
aggregation to improve session performance.
Aviod transformation errors to improve the session performance.
If the sessioin containd lookup transformation You can improve the session performance by
enabling the look up cache.
If U?r session contains filter transformation ,create that filter transformation nearer to the sources
or You can use filter condition in source qualifier.
Aggreagator,Rank and joiner transformation may oftenly decrease the session performance
.Because they must group data before processing it.To improve session
performance in this case use sorted ports option.Increase the temporary database space also
improves the performance.
1) In a single flat file, if am having multiple delimiters,then how can i load the flat file?
2)If a flat file is " , " delimited flat file and while loading data one of my field or attribute
is having this "," in the field.then how can i handle this case..?
3)while loading multiple flat files using Indirect loading method, how can i generate list
file,if am having n number of flat files of similar structure?
4)While loading multiple flat files using Indirect loading,i want to load data into one
target and the file names into one target? How u can do this?
ANS:
1) According to my knowledge. check the below answers.
1) Not possible to load multiple delimiters. Informatica can't handle this case. Request
the source data team to send in only single delimiter.
2)Informatica can't consider that field value with ','. Either request source system to
change the delimiter to some other delimiter.
3)List of files will be appended in file list file using UNIX Script. Once all the files are FTP
to source Directory.. Unix script will generate the file list with all the files exists in the
source file directory.
4) In mapping we can get the file name including field values. connect that file name
field into sorter or aggregator to get the distinct file names and load into filename target
table or file. remaining fields other than filename connect to other target.
One or more characters used to separate columns of data. Delimiters can be either
printable or single-byte unprintable characters, and must be different from the escape
character and the quote character (if selected). To enter a single-byte unprintable
character, click the Browse button to the right of this field. In the Delimiters dialog box,
select an unprintable character from the Insert Delimiter list and click Add. You cannot
select unprintable multibyte characters as delimiters.
Maximum number of delimiters is 80.
Optional Quotes
Select No Quotes, Single Quote, or Double Quotes. If you select a quote character, the
Integration Service ignores delimiter characters within the quote characters. Therefore,
the Integration Service uses quote characters to escape the delimiter.
For example, a source file uses a comma as a delimiter and contains the following row:
342-3849, ‘Smith, Jenna’, ‘Rockville, MD’, 6.
If you select the optional single quote character, the Integration Service ignores the
commas within the quotes and reads the row as four fields.
The Designer adds the CurrentlyProcessedFileName port as the last column on the
Columns tab. The CurrentlyProcessedFileName port is a string port with default
precision of 256 characters.