Professional Documents
Culture Documents
Page 2 of 111
Table of Contents
1 Introduction 4
1.1 Purpose 4
2 ORACLE 4
2.1 DEFINATIONS 4
NORMALIZATION: 5
First Normal Form: 5
Second Normal Form: 5
Third Normal Form: 5
Boyce-Codd Normal Form: 6
Fourth Normal Form: 6
ORACLE SET OF STATEMENTS: 6
Data Definition Language :(DDL) 6
Data Manipulation Language (DML) 6
Data Querying Language (DQL) 7
Data Control Language (DCL) 7
Transactional Control Language (TCL) 7
Syntaxes: 7
ORACLE JOINS: 9
Equi Join/Inner Join: 10
Non-Equi Join 10
Self Join 10
Natural Join 11
Cross Join 11
Outer Join 11
Left Outer Join 11
Right Outer Join 12
Page 3 of 111
Page 4 of 111
1 Introduction
1.1 Purpose
The purpose of this document is to provide the detailed information
about DWH Concepts and Informatica based on real-time training.
2 ORACLE
2.1 DEFINATIONS
Organizations can store data on various media and in different formats, such as
a hard-copy document
stores, retrieves, and modifies data in the database on request. There are four
main types of databases:
Page 5 of 111
A row is in first normal form (1NF) if all underlying domains contain atomic
values only.
Does not have a composite primary key. Meaning that the primary key can
not be subdivided into separate logical entities.
All the non-key columns are functionally dependent on the entire primary
key.
A row is in second normal form if, and only if, it is in first normal form and
every non-key attribute is fully dependent on the key.
Page 6 of 111
An entity is in Third Normal Form (3NF) when it meets the requirement of being
in Second Normal Form (2NF) and additionally:
Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later
writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if, and
only if, every determinant is a candidate key. Most entities in 3NF are already in
BCNF.
Create
Alter
Drop
Truncate
Insert
Update
Page 7 of 111
Select
Grant
Revoke
Commit
Rollback
Save point
Syntaxes:
REFRESH COMPLETE
AS
Page 8 of 111
DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');
Case Statement:
Select NAME,
(CASE
WHEN (CLASS_CODE = 'Subscription')
THEN ATTRIBUTE_CATEGORY
ELSE TASK_TYPE
END) TASK_TYPE,
CURRENCY_CODE
From EMP
Decode()
Select empname,Decode(address,’HYD’,’Hyderabad’,
‘Bang’, Bangalore’, address) as address from emp;
Procedure:
cust_id_IN In NUMBER,
BEGIN
End
Trigger:
Page 9 of 111
REFERENCING
NEW AS NEW
OLD AS OLD
DECLARE
BEGIN
ELSE
-- Exec procedure
Exec update_sysdate()
END;
ORACLE JOINS:
Equi join
Non-equi join
Self join
Natural join
Page 10 of 111
USING CLAUSE
ON CLAUSE
Non-Equi Join
A join which contains an operator other than ‘=’ in the joins condition.
Self Join
Page 11 of 111
Natural Join
Cross Join
Outer Join
Outer join gives the non-matching records along with matching records.
This will display the all matching records and the records which are in left hand
side table those that are not in right hand side table.
Ex: SQL> select empno,ename,job,dname,loc from emp e left outer join dept
d on(e.deptno=d.deptno);
Or
e.deptno=d.deptno(+);
Page 12 of 111
This will display the all matching records and the records which are in right
hand side table those that are not in left hand side table.
Or
This will display the all matching records and the non-matching records from
both tables.
Ex: SQL> select empno,ename,job,dname,loc from emp e full outer join dept
d on(e.deptno=d.deptno);
OR
Page 13 of 111
We can keep aggregated data into materialized view. we can schedule the MV to
refresh but table can’t.MV can be created based on multiple tables.
Materialized View:
Inline view:
If we write a select statement in from clause that is nothing but inline view.
Ex:
Get dept wise max sal along with empname and emp no.
Page 15 of 111
DELETE
The DELETE command is used to remove rows from a table. A WHERE clause
can be used to only remove some rows. If no WHERE condition is specified, all
rows will be removed. After performing a DELETE operation you need to
COMMIT or ROLLBACK the transaction to make the change permanent or to
undo it.
TRUNCATE
TRUNCATE removes all rows from a table. The operation cannot be rolled back.
As such, TRUCATE is faster and doesn't use as much undo space as a DELETE.
DROP
The DROP command removes a table from the database. All the tables' rows,
indexes and privileges will also be removed. The operation cannot be rolled
back.
ROWID
A globally unique identifier for a row in a database. It is created at the time the
row is inserted into a table, and destroyed when it is removed from a
table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the
slot(row) number, and FFFF is a file number.
ROWNUM
For each row returned by a query, the ROWNUM pseudo column returns a
number indicating the order in which Oracle selects the row from a table or set
of joined rows. The first row selected has a ROWNUM of 1, the second has 2,
and so on.
You can use ROWNUM to limit the number of rows returned by a query, as in
this example:
Page 16 of 111
FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];
The WHERE clause cannot be used to restrict groups. you use the
Both where and having clause can be used to filter the data.
Page 17 of 111
Where clause is used to restrict rows. But having clause is used to restrict
groups.
MERGE Statement
You can use merge command to perform insert and update in a single
command.
On (s1.no=s2.no)
Page 18 of 111
Sub Query:
Example:
Select deptno, ename, sal from emp a where sal in (select sal from Grade
where sal_grade=’A’ or sal_grade=’B’)
Example:
Find all employees who earn more than the average salary in their department.
Group by B.department_id)
EXISTS:
Example: Example:
Page 19 of 111
Indexes:
1. Bitmap indexes are most appropriate for columns having low distinct
values—such as GENDER, MARITAL_STATUS, and RELATION. This
assumption is not completely accurate, however. In reality, a bitmap
index is always advisable for systems in which data is not frequently
updated by many concurrent systems. In fact, as I'll demonstrate here, a
bitmap index on a column with 100-percent unique values (a column
candidate for primary key) is as efficient as a B-tree index.
7. The table is large and most queries are expected to retrieve less than 2
to 4 percent of the rows
It is a perfect valid question to ask why hints should be used. Oracle comes with
an optimizer that promises to optimize a query's execution plan. When this
optimizer is really doing a good job, no hints should be required at all.
Page 20 of 111
The ANALYZE statement can be used to gather statistics for a specific table,
index or cluster. The statistics can be computed exactly, or estimated based on a
specific number of rows, or a percentage of rows:
Hint categories:
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing
systems.
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
Page 21 of 111
CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.
Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2
or 4 or 16
Additional Hints
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.
/*+ use_hash */
ORDERED- This hint forces tables to be joined in the order specified. If you
know table X has fewer rows, then ordering it first may speed execution in a
join.
If index is not able to create then will go for /*+ parallel(table, 8)*/-----For
select and update example---in where clase like st,not in ,>,< ,<> then we will
use.
Page 22 of 111
Explain plan will tell us whether the query properly using indexes or not.whatis
the cost of the table whether it is doing full table scan or not, based on these
statistics we can tune the query.
The explain plan process stores data in the PLAN_TABLE. This table can be
located in the current schema or a shared schema and is created using in
SQL*Plus as follows:
What is your tuning approach if SQL query taking long time? Or how do u
tune SQL query?
If query taking long time then First will run the query in Explain Plan, The
explain plan process stores data in the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the
relevant indexes on the joining columns or indexes to support the query are
missing.
If joining columns doesn’t have index then it will do the full table scan if it is full
table scan the cost will be more then will create the indexes on the joining
columns and will run the query it should give better performance and also
needs to analyze the tables if analyzation happened long back. The ANALYZE
statement can be used to gather statistics for a specific table, index or cluster
using
If still have performance issue then will use HINTS, hint is nothing but a clue.
We can use hints like
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing
systems.
Page 23 of 111
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.
/*+ use_hash */
Store Procedure:
What are the differences between stored procedures and triggers?
Page 24 of 111
Using stored procedure we can access and modify data present in many
tables.
Also a stored procedure is not associated with any particular database object.
Stored procedures are not automatically run and they have to be called
explicitly by the user. But triggers get executed when the particular event
associated with the event gets fired.
Packages:
package that contains several procedures and functions that process related to
same transactions.
Triggers:
Oracle lets you define procedures called triggers that run implicitly when an
INSERT, UPDATE, or DELETE statement is issued against the associated table
Triggers are similar to stored procedures. A trigger stored in the database can
include SQL and PL/SQL
Page 25 of 111
INSTEAD OF Triggers
Row Triggers
A row trigger is fired each time the table is affected by the triggering statement.
For example, if an UPDATE statement updates multiple rows of a table, a row
trigger is fired once for each row affected by the UPDATE statement. If a
triggering statement affects no rows, a row trigger is not run.
When defining a trigger, you can specify the trigger timing--whether the trigger
action is to be run before or after the triggering statement. BEFORE and AFTER
apply to both statement and row triggers.
BEFORE and AFTER triggers fired by DML statements can be defined only on
tables, not on views.
Stored procedure may or may not Function should return at least one
return values. output parameter. Can return more
Page 26 of 111
Stored procedure accepts more than Whereas function does not accept
one argument. arguments.
Stored procedures are mainly used to Functions are mainly used to compute
process the tasks. values
Can affect the state of database using Cannot affect the state of database.
commit.
Table Space:
A database is divided into one or more logical storage units called tablespaces.
Tablespaces are divided into logical units of storage called segments.
Control File:
Page 27 of 111
Select empno, count (*) from EMP group by empno having count (*)>1;
Delete from EMP where rowid not in (select max (rowid) from EMP group by
empno);
UNION
select
emp_id,
max(decode(row_id,0,address))as address1,
Page 28 of 111
max(decode(row_id,2,address)) as address3
group by emp_id
Other query:
select
emp_id,
max(decode(rank_id,1,address)) as add1,
max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
group by
emp_id
5. Rank query:
Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order
by sal desc) r from EMP);
The DENSE_RANK function works acts like the RANK function except that it
assigns consecutive ranks:
Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over
(order by sal desc) r from emp);
Or
Select * from (select * from EMP order by sal desc) where rownum<=5;
8. 2 nd highest Sal:
Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over
(order by sal desc) r from EMP) where r=2;
9. Top sal:
Select * from EMP where sal= (select max (sal) from EMP);
11.Hierarchical queries
Starting at the root, walk from the top down, and eliminate employee Higgins
in the result, but
FROM employees
3 DWH CONCEPTS
What is BI?
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods
Page 30 of 111
In terms of design data warehouse and data mart are almost the same.
Subject Oriented:
Integrated:
Data that is gathered into the data warehouse from a variety of sources and
merged into a coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular time period.
Non-volatile:
Data is stable in a data warehouse. More data is added but data is never
removed.
What is a DataMart?
Page 31 of 111
In terms of design data warehouse and data mart are almost the same.
A fact table that contains only primary keys from the dimension tables,
Page 32 of 111
What is a Schema?
In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.
Star schema is a data warehouse schema where there is only one "fact table"
and many denormalized dimension tables.
Fact table contains primary keys from all the dimension tables and other
numeric columns columns of additive, numeric facts.
Page 33 of 111
The star schema is the simplest data Snowflake schema is a more complex
warehouse scheme. data warehouse model than a star
Page 34 of 111
In star schema only one join In snow flake schema since there is
establishes the relationship between relationship between the dimensions
the fact table and any one of the tables it has to do many joins to fetch
dimension tables. the data.
A set of level properties that describe a specific aspect of a business, used for
analyzing the factual measures.
Page 35 of 111
Types of facts?
Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
What is Granularity?
Principle: create fact tables with the most granular data possible to support
analysis of the business process.
In Data warehousing grain refers to the level of detail available in a given fact
Page 36 of 111
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.
Facts: Facts must be consistent with the grain.all facts are at a uniform grain.
Dimensions: each dimension associated with fact table must take on a single
value for each fact row.
Dimensional Model
Page 37 of 111
Conformed Dimensions (CD): these dimensions are something that is built once
in your model and can be reused multiple times with different fact tables. For
example, consider a model containing multiple fact tables, representing
different data marts. Now look for a dimension that is common to these facts
tables. In this example let’s consider that the product dimension is common
and hence can be reused by creating short cuts and joining the different fact
Page 38 of 111
When you consolidate lots of small dimensions and instead of having 100s of
small dimensions, that will have few records in them, cluttering your database
with these mini ‘identifier’ tables, all records from all these small dimension
tables are loaded into ONE dimension table and we call this dimension table
Junk dimension table. (Since we are storing all the junk in this one table) For
example: a company might have handful of manufacture plants, handful of
order types, and so on, so forth, and we can consolidate them in one dimension
table called junked dimension table
An item that is in the fact table but is stripped off of its description, because the
description belongs in dimension table, is referred to as Degenerated
Dimension. Since it looks like dimension, but is really in fact table and has been
degenerated of its description, hence is called degenerated dimension..
Dimensional Model:
Data modeling
There are three levels of data modeling. They are conceptual, logical, and
physical. This section will explain the difference among the three, the order
Page 39 of 111
No attribute is specified.
At this level, the data modeler attempts to describe the data in as much detail
as possible, without regard to how they will be physically implemented in the
database.
In data warehousing, it is common for the conceptual data model and the
logical data model to be combined into a single step (deliverable).
The steps for designing the logical data model are as follows:
6. Normalization.
At this level, the data modeler will specify how the logical data model will be
realized in the database schema.
9. http://www.learndatamodeling.com/dm_standard.htm
The differences between a logical data model and physical data model is
shown below.
Page 41 of 111
Entity Table
Attribute Column
Definition Comment
Page 42 of 111
Page 43 of 111
ACW_ORGANIZATION_D
ACW_DF_FEES_STG ACW_DF_FEES_F Primary Key
Non-Key Attributes Primary Key ORG_KEY [PK1]
SEGMENT1 ACW_DF_FEES_KEY Non-Key Attributes
ORGANIZATION_ID [PK1] ORGANIZATION_CODE
ITEM_TYPE
Non-Key Attributes CREA TED_BY
BUYER_ID CREA TION_DATE
PRODUCT_KEY
COST_REQUIRED
ORG_KEY LAST_UPDATE_DATE
QUARTER_1_COST LAST_UPDATED_BY
DF_MGR_KEY
QUARTER_2_COST
COST_REQUIRED D_CREATED_BY
QUARTER_3_COST D_CREATION_DATE
DF_FEES PID for DF Fees
QUARTER_4_COST
COSTED_BY D_LAST_UPDATE_DATE
COSTED_BY
COSTED_DATE D_LAST_UPDATED_BY
COSTED_DATE
APPROV ING_MGR
APPROV ED_BY
APPROV ED_DATE
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE ACW_USERS_D
D_LAST_UPDATE_BY Primary Key
D_LAST_UPDATED_DATE USER_KEY [PK1]
Non-Key Attributes
EDW_TIME_HIERARCHY
PERSON_ID
EMAIL_ADDRESS
ACW_PCBA_A PPROVAL_F LAST_NAME
Primary Key FIRST_NAME
ACW_PCBA_A PPROVAL_STG FULL_NAME
PCBA _APPROVAL_KEY
Non-Key Attributes [PK1] EFFECTIV E_STA RT_DATE
INV ENTORY_ITEM_ID Non-Key Attributes EFFECTIV E_END_DATE
LATEST_REV PART_KEY EMPLOYEE_NUMBER
LOCATION_ID LAST_UPDATED_BY
CISCO_PART_NUMBER
LOCATION_CODE SUPPLY_CHANNEL_KEY LAST_UPDATE_DATE
APPROV AL_FLAG CREA TION_DATE
NPI
ADJUSTMENT APPROV AL_FLAG CREA TED_BY
APPROV AL_DATE D_LAST_UPDATED_BY
ADJUSTMENT
TOTA L_ADJUSTMENT D_LAST_UPDATE_DATE
APPROV AL_DATE
TOTA L_ITEM_COST D_CREATION_DATE
ADJUSTMENT_AMT
DEMAND D_CREATED_BY
SPEND_BY _ASSEMBLY
COMM_MGR COMM_MGR_KEY ACW_PRODUCTS_D
BUYER_ID Primary Key
BUYER_ID
BUYER RFQ_CREATED ACW_PART_TO_PID_D PRODUCT_KEY [PK1]
RFQ_CREATED Users
Primary Key Non-Key Attributes
RFQ_RESPONSE
RFQ_RESPONSE
CSS PART_TO_PID_KEY [PK1] PRODUCT_NA ME
CSS D_CREATED_BY Non-Key Attributes BUSINESS_UNIT_ID
D_CREATED_DATE PART_KEY BUSINESS_UNIT
D_LAST_UPDATED_BY CISCO_PART_NUMBER PRODUCT_FAMILY_ID
ACW_DF_A PPROVAL_STG D_LAST_UPDATE_DATE PRODUCT_KEY PRODUCT_FAMILY
Non-Key Attributes PRODUCT_NA ME ITEM_TYPE
LATEST_REVISION D_CREATED_BY
INV ENTORY_ITEM_ID ACW_DF_A PPROVAL_F
D_CREATED_BY D_CREATION_DATE
CISCO_PART_NUMBER Primary Key
D_CREATION_DATE D_LAST_UPDATE_BY
LATEST_REV
DF_APPROVAL_KEY D_LAST_UPDATED_BY D_LAST_UPDATED_DATE
PCBA _ITEM_FLAG [PK1]
APPROV AL_FLAG D_LAST_UPDATE_DATE
Non-Key Attributes
APPROV AL_DATE
LOCATION_ID PART_KEY
LOCATION_CODE CISCO_PART_NUMBER
BUYER SUPPLY_CHANNEL_KEY
BUYER_ID PCBA _ITEM_FLAG
RFQ_CREATED APPROV ED ACW_SUPPLY_CHA NNEL_D
RFQ_RESPONSE APPROV AL_DATE
Primary Key
CSS BUYER_ID
RFQ_CREATED SUPPLY_CHANNEL_KEY
[PK1]
RFQ_RESPONSE
CSS Non-Key Attributes
D_CREATED_BY SUPPLY_CHANNEL
D_CREATION_DATE DESCRIPTION
D_LAST_UPDATED_BY LAST_UPDATED_BY
D_LAST_UPDATE_DATE LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE
Page 45 of 111
EDW_TIME_HIE RARCHY
ACW_US ERS_D
ACW_PCBA_APPROVAL_F Columns
Columns USE R_K EY NUMB ER(10) [P K1]
PCB A_A PPROVAL_KEY CHA R(10) [PK1] PERSON_ID CHA R(10)
ACW_PCBA_APPROVAL_STG
PART_K EY NUMB ER(10) EMAIL_ADDRESS CHA R(10)
Columns
CISCO_PA RT _NUMBE R CHA R(10) LAST_NAM E VARCHAR2(50)
INVENTORY_IT EM_ID NUMB ER(10) FIRST _NAME VARCHAR2(50)
LATEST _REV CHA R(10) SUP PLY _CHANNE L_KEYNUMB ER(10)
NPI CHA R(1) FULL_NAM E CHA R(10)
LOCATION_ID NUMB ER(10) EFFECTIVE_START _DATE
DAT E
APP ROV AL_FLAG CHA R(1)
LOCATION_CODE CHA R(10) EFFECTIVE_END_DAT E DAT E
APP ROV AL_FLAG CHA R(1) ADJUSTME NT CHA R(1)
APP ROV AL_DA TE DAT E EMPLOYEE_NUMBER NUMB ER(10)
ADJUSTME NT CHA R(1) SEX NUMB ER
APP ROV AL_DA TE DAT E ADJUSTME NT_AMT FLOAT(12)
SPE ND_BY_ASSE MBLYFLOAT(12) LAST_UPDATE_DATE DAT E
TOT AL_ADJUSTMENT CHA R(10) CRE ATION_DATE DAT E
TOT AL_ITEM _COST FLOAT(10) COMM_MGR_K EY NUMB ER(10)
BUY ER_ID NUMB ER(10) CRE ATED_BY NUMB ER(10)
DEMA ND NUMB ER D_LAST_UPDATED_BY CHA R(10)
COMM_MGR CHA R(10) RFQ_CREATED CHA R(1)
RFQ_RE SPONSE CHA R(1) D_LAST_UPDATE_DATE DAT E
BUY ER_ID NUMB ER(10) D_CREA TION_DATE DAT E
BUY ER VARCHAR2(240) CSS CHA R(10)
D_CREA TED_BY CHA R(10) D_CREA TED_BY CHA R(10)
RFQ_CREATED CHA R(1)
D_CREA TED_DAT E CHA R(10)
RFQ_RE SPONSE CHA R(1)
CSS CHA R(10) D_LAST_UPDATED_BY CHA R(10)
D_LAST_UPDATE_DATEDAT E
ACW_PRODUCTS_D
Columns
ACW_DF_APPROVA L_STG
PRODUCT_KEY NUMB ER(10) [P K1]
Columns
PRODUCT_NAME CHA R(30)
INVENTORY_IT EM_ID NUMB ER(10) BUS INESS _UNIT_ID NUMB ER(10)
CISCO_PA RT _NUMBE RCHA R(30) ACW_DF_APPROVA L_F ACW_PA RT_TO_PID_D
BUS INESS _UNIT VARCHAR2(60)
LATEST _REV CHA R(10) Columns Columns
PRODUCT_FAM ILY_ID NUMB ER(10)
PCB A_ITEM_FLAG CHA R(1) DF_APPROVAL_KEY NUMB ER(10) [P K1] PART_T O_PID_KEY NUMB ER(10) [P K1] PRODUCT_FAM ILY VARCHAR2(180)
APP ROV AL_FLAG CHA R(1) PART_K EY NUMB ER(10) PART_K EY NUMB ER(10) IT EM_TYPE CHA R(30)
APP ROV AL_DA TE DAT E CISCO_PA RT_NUMBE R CHA R(30) CISCO_PA RT_NUMBE RCHA R(30) D_CREA TED_BY CHA R(10)
LOCATION_ID NUMB ER(10) SUP PLY _CHANNE L_KEYNUMB ER(10) PRODUCT_KEY NUMB ER(10) D_CREA TION_DATE DAT E
SUP PLY _CHANNE L CHA R(10) PCB A_ITEM_FLAG CHA R(1) PRODUCT_NAME CHA R(30)
D_LAST_UPDATE_BY CHA R(10)
BUY ER VARCHAR2(240) APP ROV ED CHA R(1) LATEST _REVIS ION CHA R(10) D_LAST_UPDATED_DAT CHA
E R(10)
BUY ER_ID NUMB ER(10) APP ROV AL_DA TE DAT E D_CREA TED_BY CHA R(10)
RFQ_CREATED CHA R(1) BUY ER_ID NUMB ER(10) D_CREA TION_DATE DAT E
RFQ_RE SPONSE CHA R(1) RFQ_CREATED CHA R(1) D_LAST_UPDATED_BYCHA R(10)
CSS CHA R(10) RFQ_RE SPONSE CHA R(1) D_LAST_UPDATE_DATE DAT E
CSS CHA R(10)
D_CREA TED_BY CHA R(10)
D_CREA TION_DATE DAT E
D_LAST_UPDATED_BY CHA R(10)
D_LAST_UPDATE_DATEDAT E
ACW_SUPPLY_CHANNEL_D
Columns
SUP PLY _CHANNE L_KEYNUMB ER(10) [P K1]
SUP PLY _CHANNE L CHA R(60)
DES CRIPT ION VARCHAR2(240)
LAST_UPDATED_BY NUMB ER
LAST_UPDATE_DATE DAT E
CRE ATED_BY NUMB ER(10)
CRE ATION_DATE DAT E
D_LAST_UPDATED_BY CHA R(10)
D_LAST_UPDATE_DATEDAT E
D_CREA TED_BY CHA R(10)
D_CREA TION_DATE DAT E
Users
After Christina moved from Illinois to California, the new information replaces
the new record, and we have the following table:
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem,
since there is no need to keep track of the old information.
Disadvantages:
- Usage:
Type 1 slowly changing dimension should be used when it is not necessary for
the data warehouse to keep track of historical changes.
After Christina moved from Illinois to California, we add the new information as
a new row into the table:
Advantages:
Disadvantages:
- This will cause the size of the table to grow fast. In cases where the number of
rows for the table is very high to start with, storage and performance can
become a concern.
Usage:
Type 2 slowly changing dimension should be used when it is necessary for the
data warehouse to track historical changes.
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the
particular attribute of interest, one indicating the original value, and one
indicating the current value. There will also be a column that indicates when
the current value becomes active.
Page 48 of 111
Customer Key
Name
Original State
Current State
Effective Date
After Christina moved from Illinois to California, the original information gets
updated, and we have the following table (assuming the effective date of
change is January 15, 2003):
Advantages:
- This does not increase the size of the table, since new information is updated.
Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more
than once. For example, if Christina later moves to Texas on December 15,
2003, the California information will be lost.
Usage:
If target and source databases are different and target table volume is high it
contains some millions of records in this scenario without staging table we need
to design your informatica using look up to find out whether the record exists or
not in the target table since target has huge volumes so its costly to create cache
it will hit the performance.
If we create staging tables in the target database we can simply do outer join in
the source qualifier to determine insert/update this approach will give you good
performance.
Data cleansing
Data merging
My understanding of ODS is, its a replica of OLTP system and so the need of
this, is to reduce the burden on production system (OLTP) while fetching data
for loading targets. Hence its a mandate Requirement for every Warehouse.
OLTP is a sensitive database they should not allow multiple select statements it
may impact the performance as well as if something goes wrong while fetching
data from OLTP to data warehouse it will directly impact the business.
A surrogate key is any column or set of columns that can be declared as the
primary key instead of a "real" or natural key. Sometimes there can be several
natural keys that could be declared as the primary key, and these are all called
candidate keys. So a surrogate is a candidate key. A table could actually have
more than one surrogate key, although this would be unusual. The most
common type of surrogate key is an incrementing integer, such as an auto
increment column in MySQL, or a sequence in Oracle, or an identity column in
SQL Server.
4 ETL-INFORMATICA
Informatica is a powerful Extraction, Transformation, and Loading tool and is been deployed at
GE Medical Systems for data warehouse development in the Business Intelligence Team.
Informatica comes with the following clients to perform various tasks.
Page 52 of 111
Informatica Transformations:
Mapplet:
o Normalizer transformations
o COBOL sources
Page 53 of 111
o XML sources
o Target definitions
o Other mapplets
The mapplet contains at least one Output transformation with at least one
port connected to a transformation in the mapplet.
System Variables
$$$SessStartTime returns the initial system date value on the machine hosting
the Integration Service when the server initializes a session. $$$SessStartTime
returns the session start time as a string value. The format of the string
depends on the database you are using.
Filter: The Filter transformation is used to filter the data based on single
condition and pass through next transformation.
Router: The router transformation is used to route the data based on multiple
conditions and pass through next transformations.
1) Input group
3) Default group
1) Connected
2) Unconnected
Page 55 of 111
Lookup Caches:
When configuring a lookup cache, you can specify any of the following options:
Persistent cache
Page 56 of 111
Static cache
Dynamic cache
Shared cache
Dynamic cache: When you use a dynamic cache, the PowerCenter Server
updates the lookup cache as it passes rows to the target.
If you configure a Lookup transformation to use a dynamic cache, you can only
use the equality operator (=) in the lookup condition.
NewLookupRow
Description
Value
Static cache: It is a default cache; the PowerCenter Server doesn’t update the
lookup cache as it passes rows to the target.
Persistent cache: If the lookup table does not change between sessions,
configure the Lookup transformation to use a persistent lookup cache. The
PowerCenter Server then saves and reuses cache files from session to session,
eliminating the time required to read the lookup table.
Page 57 of 111
Best example where we need to use If we use static lookup first record it
dynamic cache is if suppose first will go to lookup and check in the
record and last record both are lookup cache based on the
same but there is a change in the condition it will not find the match
address. What informatica mapping so it will return null value then in
has to do here is first record needs the router it will send that record
to get insert and last record should to insert flow.
get update in the target table.
But still this record dose not
available in the cache memory so
when the last record comes to
lookup it will check in the cache it
will not find the match so it returns
null value again it will go to insert
flow through router but it is
suppose to go to update flow
because cache didn’t get refreshed
when the first record get inserted
into target table.
Rank: The Rank transformation allows you to select only the top or bottom rank
of data. You can use a Rank transformation to return the largest or smallest
numeric value in a port or group.
Union Transformation:
The Union transformation is a multiple input group transformation that you can
use to merge data from multiple pipelines or pipeline branches into one
pipeline branch. It merges data from multiple sources similar to the UNION ALL
SQL statement to combine the results from two or more SQL statements.
Similar to the UNION ALL statement, the Union transformation does not
remove duplicate rows.Input groups should have similar structure.
1) Mapping level
2) Session level.
Page 59 of 111
Transformation type:
Active
Connected
Aggregate cache: The Integration Service stores data in the aggregate cache
until it completes aggregate calculations. It stores group values in an index
cache and row data in the data cache.
Group by port: Indicate how to create groups. The port can be any input,
input/output, output, or variable port. When grouping data, the Aggregator
transformation outputs the last row of each group unless otherwise specified.
Sorted input: Select this option to improve session performance. To use sorted
input, you must pass data to the Aggregator transformation sorted by group by
port, in ascending or descending order.
Aggregate Expressions:
Page 60 of 111
Aggregate Functions
(AVG,COUNT,FIRST,LAST,MAX,MEDIAN,MIN,PERCENTAGE,SUM,VARIANCE and
STDDEV)
When you use any of these functions, you must use them in an expression
within an Aggregator transformation.
Use sorted input to increase the mapping performance but we need to sort the
data before sending to aggregator transformation.
SQL Transformation
Transformation type:
Active/Passive
Connected
The SQL transformation processes SQL queries midstream in a pipeline. You can
insert, delete, update, and retrieve rows from a database. You can pass the
database connection information to the SQL transformation as input data at run
time. The transformation processes external SQL scripts or SQL queries that you
create in an SQL editor. The SQL transformation processes the query and
returns rows and database errors.
For example, you might need to create database tables before adding new
transactions. You can create an SQL transformation to create the tables in a
workflow. The SQL transformation returns database errors in an output port.
You can configure another workflow to run if the SQL transformation returns no
errors.
Page 61 of 111
Script mode. The SQL transformation runs ANSI SQL scripts that are externally
located. You pass a script name to the transformation with each input row. The
SQL transformation outputs one row for each input row.
Query mode. The SQL transformation executes a query that you define in a
query editor. You can pass strings or parameters to the query to define dynamic
queries or change the selection parameters. You can output multiple rows
when the query has a SELECT statement.
Database type. The type of database the SQL transformation connects to.
Script Mode
An SQL transformation configured for script mode has the following default
ports:
ScriptName Input Receives the name of the script to execute for the current
row.
ScriptResult Outpu Returns PASSED if the script execution succeeds for the
t row. Otherwise contains FAILED.
ScriptError Outpu Returns errors that occur when a script fails for a row.
Page 62 of 111
Transformation type:
Active/Passive
Connected
For example, you can define transformation logic to loop through input rows
and generate multiple output rows based on a specific condition. You can also
use expressions, user-defined functions, unconnected transformations, and
mapping variables in the Java code.
Transformation type:
Active
Connected
PowerCenter lets you control commit and roll back transactions based on a set
of rows that pass through a Transaction Control transformation. A transaction is
the set of rows bound by commit or roll back rows. You can define a transaction
based on a varying number of input rows. You might want to define
transactions based on a group of rows ordered on a common key, such as
employee ID or order entry date.
Within a session. When you configure a session, you configure it for user-
defined commit. You can choose to commit or roll back a transaction if the
Integration Service fails to transform or write any row to the target.
When you run the session, the Integration Service evaluates the expression for
each row that enters the transformation. When it evaluates a commit row, it
commits all rows in the transaction to the target or targets. When the
Integration Service evaluates a roll back row, it rolls back all rows in the
transaction from the target or targets.
If the mapping has a flat file target you can generate an output file each time
the Integration Service starts a new transaction. You can dynamically name
each target flat file.
Transaction control
expression
The expression contains values that represent actions the Integration Service
performs based on the return value of the condition. The Integration Service
evaluates the condition on a row-by-row basis. The return value determines
whether the Integration Service commits, rolls back, or makes no transaction
changes to the row. When the Integration Service issues a commit or roll back
based on the return value of the expression, it begins a new transaction. Use
the following built-in variables in the Expression Editor when you create a
transaction control expression:
Page 64 of 111
Joiner Lookup
Page 65 of 111
In source qualifier it will push all the Where as in lookup we can restrict
matching records. whether to display first value, last
value or any value
When both source and lookup are in When the source and lookup table
same database we can use source exists in different database then we
qualifier. need to use lookup.
1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows
into staging table we don’t have any transformation inside the mapping its
1 to 1 mapping .Here nothing is there to optimize the mapping so I
created session partitions using key range on effective date column. It
improved performance lot, rather than 4 hours it was running in 30
minutes for entire 40millions.Using partitions DTM will creates multiple
reader and writer threads.
2) There was one more scenario where I got very good performance in the
mapping level .Rather than using lookup transformation if we can able to
do outer join in the source qualifier query override this will give you good
performance if both lookup table and source were in the same database.
If lookup tables is huge volumes then creating cache is costly.
4) If any mapping taking long time to execute then first we need to look in to
Page 66 of 111
If we look into session logs it shows busy percentage based on that we need to
find out where is bottle neck.
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ****
If suppose I've to load 40 lacs records in the target table and the workflow
is taking about 10 - 11 hours to finish. I've already increased
the cache size to 128MB.
There are no joiner, just lookups
and expression transformations
Ans:
this case drop constraints and indexes before you run the
Genarally What it do is it will load the data first in parent table then it will load
it in to child table.
Let’s assume we have imported some source and target definitions in a shared
folder after that we are using those sources and target definitions in another
folders as a shortcut in some mappings.
If any modifications occur in the backend (Database) structure like adding new
columns or drop existing columns either in source or target I f we reimport into
shared folder those new changes automatically it would reflect in all
folder/mappings wherever we used those sources or target definitions.
If we don’t have primary key on target table using Target Update Override
Page 68 of 111
You can override the WHERE clause to include non-key columns. For example,
you might want to update records for employees named Mike Smith only. To do
this, you edit the WHERE clause as follows:
If you modify the UPDATE portion of the statement, be sure to use :TU to
specify ports.
4) Because its var so it stores the max last upd_date value in the
repository, in the next run our source qualifier query will fetch only the
records updated or inseted after previous run.
3 create two store procedures one for update cont_tbl_1 with session
st_time, set property of store procedure type as Source_pre_load .
Update the previous record eff-end-date with sysdate and insert as a new
record with source data.
Once you fetch the record from source qualifier. We will send it to lookup
to find out whether the record is present in the target or not based on
source primary key column.
Once we find the match in the lookup we are taking SCD column from
lookup and source columns from SQ to expression transformation.
If the source and target data is same then I can make a flag as ‘S’.
If the source and target data is different then I can make a flag as ‘U’.
If source data does not exists in the target that means lookup returns null
value. I can flag it as ‘I’.
Based on the flag values in router I can route the data into insert and
update flow.
Complex Mapping
Source file directory contain older than 30 days files with timestamps.
For this requirement if I hardcode the timestamp for source file name it
will process the same file every day.
Then I am going to use the parameter file to supply the values to session
variables ($InputFilename).
This mapping will update the parameter file with appended timestamp to
file name.
I make sure to run this parameter file update mapping before my actual
mapping.
We need to send those records to flat file after completion of 1st session
run. Shell script will check the file size.
If the file size is greater than zero then it will send email notification to
source system POC (point of contact) along with deno zero record file and
appropriate email subject and body.
If file size<=0 that means there is no records in flat file. In this case shell
script will not send any email notification.
Or
We are expecting a not null value for one of the source column.
Page 72 of 111
Source qualifier will select the data from the source table.
Parameter file it will supply the values to session level variables and mapping
level variables.
$DBConnection_Source
$DBConnection_Target
$InputFile
$OutputFile
Variable
Parameter
What is the difference between mapping level and session level variables?
Page 73 of 111
Delimiter
Fixed Width
In fixed width we need to known about the format first. Means how many
character to read for particular column.
If the file contains the header then in definition we need to skip the first row.
List file:
If you want to process multiple files with same structure. We don’t need
multiple mapping and multiple sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in
the main mapping.
It is a text file below is the format for parameter file. We use to place this file in
the unix box where we have installed our informatic server.
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_W
EEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRI]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20
070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
Page 74 of 111
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070
921
$$CountryCode=BE
$$CustomerNumber=101495
Page 75 of 111
Page 76 of 111
• Client applications are the same, but work on top of the new services
framework
Page 78 of 111
8) concurrent cache creation and faster index building are additional feature in
lookup transformation
13)flat file names we can populate to target while processing through list file .
14)For Falt files header and footer we can populate using advanced options in
8 at session level.
Effective in version 8.0, you create and configure a grid in the Administration
Console. You configure a grid to run on multiple nodes, and you configure one
Integration Service to run on the grid. The Integration Service runs processes on
the nodes in the grid to distribute workflows and sessions. In addition to
running a workflow on a grid, you can now run a session on a grid. When you
run a session or workflow on a grid, one service process runs on each available
node in the grid.
Page 79 of 111
2. IS starts ISP
Manages the data from source system to target system within the
memory and disk
Load Balancer
The Integration Service starts one or more Integration Service processes to run
and monitor workflows. When we run a workflow, the ISP starts and locks the
workflow, runs the workflow tasks, and starts the process to run sessions. The
functions of the Integration Service Process are,
Page 80 of 111
Load Balancer
2. The Load Balancer dispatches all tasks to the node that runs the master
Integration Service process
1. The Load Balancer verifies which nodes are currently running and enabled
2. The Load Balancer identifies nodes that have the PowerCenter resources
required by the tasks in the workflow
3. The Load Balancer verifies that the resource provision thresholds on each
candidate node are not exceeded. If dispatching the task causes a
threshold to be exceeded, the Load Balancer places the task in the
dispatch queue, and it dispatches the task later
When the workflow reaches a session, the Integration Service Process starts the
DTM process. The DTM is the process associated with the session task. The
DTM process performs the following tasks:
Adds partitions to the session when the session is configured for dynamic
partitioning.
Page 82 of 111
Sends a request to start worker DTM processes on other nodes when the
session is configured to run on a grid.
Runs post-session stored procedures, SQL, and shell commands and sends
post-session email
Page 83 of 111
Page 84 of 111
In expression assign max last update date value to the variable using function
set max variable.
Page 85 of 111
Page 87 of 111
Because its mapping parameter so every time we need to update the value in
the parameter file after comptetion of main session.
Parameterfile:
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_W
EEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRI]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
Page 88 of 111
Main mapping
Page 89 of 111
Workflod Design
Page 90 of 111
1) How to populate 1st record to 1st target ,2nd record to 2nd target
,3rd record to 3rd target and 4th record to 1st target through
informatica?
We can do using sequence generator by setting end value=3 and enable cycle
option.then in the router take 3 goups
In 1st group specify condition as seq next value=1 pass those records to 1st
target simillarly
In 2nd group specify condition as seq next value=2 pass those records to 2nd
target
In 3rd group specify condition as seq next value=3 pass those records to 3rd
target.
Since we have enabled cycle option after reaching end value sequence
generator will start from 1,for the 4th record seq.next value is 1 so it will go to
1st target.
Page 91 of 111
I want to generate the separate file for every State (as per state, it should
generate file).It has to generate 2 flat files and name of the flat file is
corresponding state name that is the requirement.
Below is my mapping.
Source:
AP 2 HYD
AP 1 TPT
KA 5 BANG
KA 7 MYSORE
KA 3 HUBLI
This functionality was added in informatica 8.5 onwards earlier versions it was
not there.
We can achieve it with use of transaction control and special "FileName" port in
the target file .
In order to generate the target file names from the mapping, we should make
use of the special "FileName" port in the target file. You can't create this special
port from the usual New port button. There is a special button with label "F" on
it to the right most corner of the target flat file when viewed in "Target
Designer".
When you have different sets of input data with different target files created,
use the same instance, but with a Transaction Control transformation which
defines the boundary for the source sets.
in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of
Page 92 of 111
Ename EmpNo
stev 100
methew 100
john 101
tom 101
Target:
Ename EmpNo
Stev 100
methew
If record doen’t exit do insert in target .If it is already exist then get
corresponding Ename vale from lookup and concat in expression with current
Ename value then update the target Ename column using update strategy.
Page 93 of 111
Sort the data in sq based on EmpNo column then Use expression to store
previous record information using Var port after that use router to insert a
record if it is first time if it is already inserted then update Ename with concat
value of prev name and current name value then update in target.
Source:
Ename EmpNo
stev 100
Stev 100
john 101
Mathew 102
Output:
Target_1:
Ename EmpNo
Stev 100
John 101
Mathew 102
Target_2:
Ename EmpNo
Stev 100
Page 94 of 111
If record doen’t exit do insert in target_1 .If it is already exist then send it to
Target_2 using Router.
Sort the data in sq based on EmpNo column then Use expression to store
previous record information using Var ports after that use router to route the
data into targets if it is first time then sent it to first target if it is already
inserted then send it to Tartget_2.
We can process all flat files through one mapping and one session using list file.
First we need to create list file using unix script for all flat file the extension of
the list file is .LST.
If both workflow exists in same folder we can create 2 worklet rather than
creating 2 workfolws.
Page 95 of 111
We can set the dependency between these two workflow using shell
script is one approach.
If both workflow exists in different folrder or different rep then we can use
below approaches.
As soon as first workflow get completes we are creating zero byte file
(indicator file).
If indicator file is not available we will wait for 5 minutes and again we will
check for the indicator. Like this we will continue the loop for 5 times i.e
30 minutes.
After 30 minutes if the file does not exists we will send out email
notification.
We can put event wait before actual session run in the workflow to wait a
indicator file if file available then it will run the session other event wait it will
wait for infinite time till the indicator file is available.
Solution:
Using var ports in expression we can load cumulative salary into target.
Page 96 of 111
Page 97 of 111
To start development on any data mart you should have the following things set
up by the Informatica Load Administrator
Transformation Specifications
While estimating the time required to develop mappings the thumb rule is as
follows.
It’s an accepted best practice to always load a flat file into a staging table before
Page 98 of 111
Always use LTRIM, RTRIM functions on string columns before loading data into
a stage table.
You can also use UPPER function on string columns but before using it you need
to ensure that the data is not case sensitive (e.g. ABC is different from Abc)
If you are loading data from a delimited file then make sure the delimiter is not
a character which could appear in the data itself. Avoid using comma-separated
files. Tilde (~) is a good delimiter to use.
Failure Notification
Once in production your sessions and batches need to send out notification
when then fail to the Support team. You can do this by configuring email task
in the session level.
Port Standards:
Input Ports – It will be necessary to change the name of input ports for lookups,
expression and filters where ports might have the same name. If ports do have
the same name then will be defaulted to having a number after the name.
Change this default to a prefix of “in_”. This will allow you to keep track of
input ports through out your mappings.
Prefixed with: IN_
Transformation should be prefixed with a “v_”. This will allow the developer to
distinguish between input/output and variable ports. For more explanation of
Variable Ports see the section “VARIABLES”.
Prefixed with: V_
Page 99 of 111
Quick Reference
Aggregator AGG_<Purpose>
Expression EXP_<Purpose>
Filter FLT_<Purpose>
Rank RNK_<Purpose>
Router RTR_<Purpose>
Mapplet MPP_<Purpose>
connections.
1. Cache lookups if source table is under 500,000 rows and DON’T cache for
tables over 500,000 rows.
7. Avoid using Stored Procedures, and call them only once during the
mapping if possible.
8. Remember to turn off Verbose logging after you have finished debugging.
10.When overriding the Lookup SQL, always ensure to put a valid Order By
statement in the SQL. This will cause the database to perform the order
rather than Informatica Server while building the Cache.
16.Define the source with less number of rows and master source in Joiner
Transformations, since this reduces the search time and also the cache.
19.If the lookup table is on the same database as the source table, instead of
using a Lookup transformation, join the tables in the Source Qualifier
Transformation itself if possible.
20.If the lookup table does not change between sessions, configure the
Lookup transformation to use a persistent lookup cache. The Informatica
Server saves and reuses cache files from session to session, eliminating
the time required to read the lookup table.
24.Reduce the number of rows being cached by using the Lookup SQL
Override option to add a WHERE clause to the default SQL statement.
Testing regimens:
1. Unit Testing
2. Functional Testing
User Acceptance Testing(UAT): The testing of the entire application by the end-
users ensuring the application functions as set forth in the system requirements
documents and that the system meets the business needs.
UTP Template:
Actual Pass Tested
Results, or Fail By
(P or
Step Description Test Conditions Expected Results F)
SAP-
CMS
Interf
aces
1 Check for the SOURCE: Both the source and target Should be Pass Stev
total count table load record count same as the
of records in SELECT count(*) FROM should match. expected
source tables XST_PRCHG_STG
that is
fetched and
the total TARGET:
records in
the PRCHG Select count(*) from
table for a _PRCHG
perticular
session
timestamp
(P or
Step Description Test Conditions Expected Results F)
2 Check for all select PRCHG_ID, Both the source and target Should be Pass Stev
the target table record values should same as the
columns PRCHG_DESC, return zero records expected
whether they
are getting DEPT_NBR,
populated
correctly EVNT_CTG_CDE,
with source
data. PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from T_PRCHG
MINUS
select PRCHG_ID,
PRCHG_DESC,
DEPT_NBR,
EVNT_CTG_CDE,
PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from PRCHG
3 Check for Identify a one record from It should insert a record into Should be Pass Stev
Insert the source which is not in target table with source data same as the
strategy to target table. Then run the expected
load records session
into target
table.
4 Check for Identify a one Record It should update record into Should be Pass Stev
Update from the source which is target table with source data same as the
strategy to already present in the for that existing record expected
load records target table with different
into target PRCHG_ST_CDE or
table. PRCHG_TYP_CDE values
Then run the session
cd /pmar/informatica/pc/pmserver/
2) And if we suppose to process flat files using informatica but those files were
exists in remote server then we have to write script to get ftp into informatica
server before start process those files.
3) And also file watch mean that if indicator file available in the specified
location then we need to start our informatica jobs otherwise will send email
notification using
4) Using shell script update parameter file with session start time and end time.
This kind of scripting knowledge I do have. If any new UNIX requirement comes
then I can Google and get the solution implement the same.
Basic Commands:
Cat file1 (cat is the command to create none zero byte file)
cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)
cat file1 >> file2---it will append to file 2
Page 106 of 111
ps -A
Crontab command.
Crontab command is used to schedule jobs. You must have permission to run
this command by Unix Administrator. Jobs are scheduled in five numbers, as
follows.
Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week (0-
6) (0 is Sunday)
so for example you want to schedule a job which runs from script named
backup jobs in /usr/local/bin directory on sunday (day 0) at 11.25 (22:25) on
15th of month. The entry in crontab file will be. * represents all values.
25 22 15 * 0 /usr/local/bin/backup_jobs
who | wc -l
Page 107 of 111
$ ls -l | grep '^d'
Pipes:
The pipe symbol "|" is used to direct the output of one command to the input
of another.
ls –a
find -name aaa.txt Finds all the files named aaa.txt in the current directory
or
Sed (The usual sed command for global string search and replace is this)
If you want to replace 'foo' with the string 'bar' globally in a file.
find / -name vimrc Find all the files named 'vimrc' anywhere on the system.
Find all files whose names contain the string 'xpilot' which
You can find out what shell you are using by the command:
echo $SHELL
#!/usr/bin/sh
Or
#!/bin/ksh
It actually tells the script to which interpreter to refer. As you know, bash shell
has some specific functions that other shell does not have and vice-versa. Same
way is for perl, python and other languages.
It's to tell your shell what shell to you in executing the following statements in
your shell script.
Interactive History
A feature of bash and tcsh (and sometimes others) you can use
Opening a file
Vi filename
Creating text
Edit modes: These keys enter editing modes and type in the text
of your document.
r Replace 1 character
R Replace mode
Deletion of text
:w! existing.file Overwrite an existing file with the file currently being edited.
:q Quit.