You are on page 1of 11

E.F.

Codd -13 rules


Starts from 0 to 12 Define null rules, data organizing, normalization, Denormalization. Front end is OLAP OLTP is primarily for storage of data. Only current data.

Normal forms

Reasons for a datawarehouse: Warehouse is quick OLAP is read intensive but OLTP is write intensive(updating, deleting)

Denormalizing will result in faster retrieval whereas normalizing results in slow retrieval of data Dimension table has descriptive data and also contain hierarchy. NOTE: - Julian day (Refer net) Dimension is a part of star/snow flake schema which contains descriptive data and also contains hierarchy. Dimension contains primary key but fact does not contains descriptive data and it will contains keys of dimension table.

Data Warehouse is 1. Subject oriented 2. Integrated 3. Time variant 4. Non-volatile collection of data Subject Oriented: - HR, marketing, sales, . Note: data doesnt get created in the warehouse; it gets created only in operational system. Operational system is application specific and DW is integrated.

Time variant: - operational system is basically current data, data warehouse is historical data and it doesnt change over a period of time, it gets updated but a rare occurrence. Non-Volatile:- not as volatile as operational system. Insert, update, delete on operational system. Initial and incremental load on datawarehouse. Integrated- putting everything into a database is called integrating (or DSS) Eg: APAC- India (XLS), Japan (Database) bringing everything together in one system. Scrubbed- consistency in data. Removing all inconsistencies [eg: data format] Meta data:-data which describes about the data Data scrubbing or data cleansing is a cat of detecting and removing corrupt or inaccurate data (dirty data). 02/01/2011 Day 2

A Datamart is a subset of DW. DM is only a small piece of a DW; it may or may not be integrated within a DW. Low costs, less info than a warehouse, rapid response are some advantages of DM. A DW is not a DW when its an unarchitected collection of data marts. ODS is integrated, subject oriented, volatile where update can be done. ODS is used only to store data.

ODS is a place where historic data is stored.

OLAP
Reports on the DW

OLAP terms: Slice and dice cross sectional analysis of data across geography, across time period. Drill down and drill up: - moving up and down the hierarchy. Multidimensional analysis: - easier to do analysis on star schema. What if analysis: - predictive analysis or a comparison.

Staging- temp storage of data, they get truncated and reloaded. Data cleansing- removing inconsistencies. Data scrubbing and massaging- converting data from one format to other

EDW contains detailed as well as summarized. Grain: - smallest level of data. Granularity: - higher to lower or lower to higher. EDW-top down approach. Concentrate module by module or DM by DM. Architectured together but built one by one DM. EDW bottom up approach. 1st enterprise DM architecture (EDMA) is developed architecture one by one DM. Requirements are given by business users so DW is driven by business users not by IS team.

ER Modelling: Entity is sales org. Entities are made up of attributes.

Sales Organization Sales Org ID Distribution Channel Attributes

Dimension puts measure is perspective. Star schema centralized fact table surrounded by dimension table with dimension tables having primary key and descriptive data also with hierarchy and fact tables with measures and foreign keys.

Snow fake schemas- all dimension was split into multiple tables. Normally performance is depreciated because of more joins. Hybrid schema not all dimension tables are normalized. Optimal snow-flake- fact could be related to normalized dimension optimized for performance. Star-schema has better performance because of the advantages- no restriction on the number of attributes like we had earlier in databases. Data retrieval is easy.

03/01/2011-Day 3

RAC-Refer net-configure multiple systems with same instance name(high availability) SOA-Refer Net

ETL:- Extract load and then transform(newer version).Extract the data, dump it into the target and then transform(ELT) Oracle data integrator is ELT (this is the future) ETL is more common-extraction, transformation and loading. ETL is also used in data migration projects- moving data from one application to another. We mainly do this because the application could be a legacy system. Informatica cannot validate a address i.e, strictly informatica is not perfect

Informatica is a data integrator tool. Informatica is built on java. Factors influencing ETL architecture : Volume at each WH component. Time window available for extraction i.e, Transactional system is usually 24/7. So window period is given to extraction is the time sufficient to retrieve data. Extraction type (initial/full, incremental/periodic).

Changed data capture(Net) SCDs Dimension doesnt get updated very frequently 3 types of SCDs 1. Type 1- just updates it, no history. 2. Type 2- maintains history (using surrogate key).Surrogate keys are used only while joining fact and dimension tables. It is not used for data retrieval or reporting. Flag N, Y. Version- Using surrogate key (row addition). Timestamp. 3. Type 3-effective data and history by introducing column.

Day 4 Create and configure a DB -> Next [Instance combination of process and memory process-SMON, PMON. All process combined together gives RDBMS.] Transaction log-maintain all transaction history. Desktop class Oracle base D:\ Sys /Sys are username/pwd.

Repository- is accessed through the server from client tools. We have SOA MS Access does not satisfy most of Codds rules and hence its not an RDBMS. Service something that is provided on demand. Informatica 8.6 is a fail proof service. Primary admistrative unit is a domain under this we have nodes.

DOMAIN

Node 1
n NODFE1 2 Node

Under a node, we have application service and service manager.

Node App Service p p service Service Manager

Node is an individual server. Service manager- authorization, authentication and resides on primary master gateway node.

05/01/2011-Day 5

Superglue-reporting on metadata itself. Power Exchange- mainframe data access. Admin console is a main difference between 7 & 8. In 7 it is a window based and in 8 it is web based. Designer ( we are laying pipelines)[cleansing rules, transformation rules] Workflow manager- create tasks (sessions) and workflows (to do pumping mechanism).Informatica server in 7 powers this & (integration service) in 8 powers this. We create repository in repserver admin console and repository managers help to manage it. Workflow monitor-helps to monitor session.

Power Center 7 architecture: ETL: - data is integrated and transformed from one operational system to another or DW. Domain: - is a primary administrative unit of an informatica server. Node: - is a single server. Integration service is the one which interacts between repository database and informatica server. Domain: - collections of nodes and services. Domain

Grid 1

Grid2

One node acts as a gateway to receive service requests. Note:- refer admin pdf. Migrating from 7.X to 8.X is first migrate from 7.6 to 8.1 then from 8.1 to 8.6(or whatever) Create a repository means (repository service). Create two users in oracle

08/01/2011- Day 6 Installation (pc86-Win32-X86) Server Ok [English language]

Con sys as sysdba; Create user domain identified by domain; Grant dba to domain;(Userid/pwd->domain) Create user infa identified by infa; Grant dba to infa-(Userid/pwd->infa)

Next->install power centre 8.6(License key->2nd folder->oracle Prod key) uwin No need to enable HTTPS Create new domain->next DB type: Oracle DB URL:BLRLXP (Comp name) : 1521 Domain (user id) Domain(pwd) Orcl [test connection]

Auto config Node, domain raviss/Winner-09 (uncheck)- (Run informatica services on the same. Admin console(raviss/Winner)

Create new user repository service dbuser- infa dbpwd- infa Service name: Practice License: Node:

Create new repository Create as global repository Enable version control

From exclusive to normal mode in the general properties. Create IS [Training _IS]

Rep user name: Administrator Rep pwd: Administrator Data movement mode: ASCII

11/01/2011

Day 7

SOA send a request, acknowledgement is done [read SOA on the net] BLOB,CLOB[large objects] & NCLOB-(on the net)

Multiple paragraphs of data can be stored in BLOB & CLOB 11 tables get created when domain(PCSF) is created Go to rep and fetch metadata repository tables are like OCPB tables. Pipelines are lined by designers To run a workflow, we use integration service. Primary gateway node also does authentication, authorization logging etc in order. Application services: SAP BW service can be only one node. Integration services- interacting between client and rep. Transient failures [read on the net] Partitions:- data volume size is low and can be processed easily Dispatch mode-as and when we got a request, dispatch it Resource provision threshold:Native DB drivers-connectivity is provided by DB itself for us to convert to

12/01/11 Day 8

Note: - Erwin is a data modelling tool. Source analyzer-importing just structure (columns and data types). Target designerTransformation developer reusable transformations. Mapplet designer reusable components. Mapping designer-mapping fetched data from the source to load into the target. Transformations- set of instructions to IS to perform certain tests. It is a repository object that performs specific function. Transformations have to have a source and target to be run.

2 types : Active: - number of records that pass through are changed, multiple row functions. Passive: - does not change the number of records that pass through, single row functions. Connected: - it is part of pipeline and connected as other well. Unconnected : - it is not connected to other transformations in the pipeline.

SQ Transformations: It comes along with the source definition. SQ acts an interpreter of datatypes(i/p conversion) Used to eliminate duplicate records. Tracing level[level of logs generated] Terse level (lowest level of log). Normal (more details than terse). Verbose initial (more details but not entire record). Verbose data (complete data into session log).

Pre SQL- loading data from a table, introduce the index. Post SQL- drop the index after loading the data. A part is a column in informatica.

14/01/11

Day 9

When you import from a flat file, always give extra precision.

You might also like