Professional Documents
Culture Documents
acct_start_date income first_name acct_end_date age last_name ref_acct_nbr years_with_bank gender Empno nbr_children marital_status gender street_nbr marital_status street_name postal_code city_name state_code SAVINGS acct_nbr acct_type cust_id ref_acct_nbr Empno minimum_balance per_check_fee account_active acct_start_date acct_end_date starting_balance ending_balance CHECKING acct_nbr acct_type cust_id ref_acct_nbr Empno minimum_balance account_active acct_start_date acct_end_date starting_balance ending_balance
LOAN acct_nbr cust_id Agent_id credit_limit credit_rating account_active acct_start_date acct_end_date starting_balance ending_balance
CREDIT acct_nbr cust_id Agent_id credit_limit credit_rating account_active acct_start_date acct_end_date starting_balanc ending_balance
SAVING_TRAN Tran_Id Cust_Id Acct_Nbr Channel_Nbr Session_Id Tran_Duration Tran_Amt Principal_Amt Interest_Amt New_Balance Tran_Date DATE, Tran_Time
Channel Tran_Code
CHECK_TRAN Tran_Id Cust_Id Acct_Nbr Channel_Nbr Session_Id Tran_Duration Tran_Amt Principal_Amt Interest_Amt New_Balance Tran_Date DATE, Tran_Time Channel Tran_Code AGENT Agent_id Agent_name Agent_type Location
LOAN_TRAN Tran_Id Cust_Id Acct_Nbr Channel_Nbr Session_Id Tran_Duration Tran_Amt Principal_Amt Interest_Amt New_Balance Tran_Date DATE, Tran_Time Channel Tran_Code
CREDIT_TRA Tran_Id Cust_Id Acct_Nbr Channel_Nbr Session_Id Tran_Duration Tran_Amt Principal_Amt Interest_Amt New_Balance Tran_Date DATE, Tran_Time Channel Tran_Code
PRODUCT_DIM Acct_Key Acct_nbr Acct_type Acct_start_date Acct_end_date Trans_code Trans_id Earnings Transaction_fee Service diagram Account_active Account_bal_credit Channel Ref_acct_nbr DATE_DIM Date_key DT_calender_Date DT_weekday_full DT_weekend_full DT_calen_week_numb DT_calen_month_numb DT_calen_qtr_numbr DT_calen_monthend DT_calen_quater_number_mon th DT_calen_year_nmbr DT_calen_FISICALYear
CUSTOMER_DIM Cust_Key Cust_id Name Income Age Year_with_bank nbr_children gender marital_status acct_start_date acct_end_date street_number street_name customer_effi_points customer_track_points customer_ref_points
Profit_on_loan_credit
TRANSACTION_DIM Trans_Key Trans_id Trans_code Channel_nbr Agent_id Session_IT Transaction_charge Transaction_amt Transaction_time
Dimension Overview Based on the business requirements just listed, the grain and dimensionality of the initial model begin to emerge. We start with a core fact table that records the primary balances of every account at the end of each month. Clearly, the grain of the fact table is one row for each account at the end of each month. Based on this grain declaration, we initially envision a design with only two dimensions month and account. A data-centric designer might argue that all the other description information, such as household, branch, and product characteristics, should be embedded as descriptive attributes of the account dimension because each account has only one household, branch, and product associated with it. While this schema accurately represents the many-to-one and many-to-many relationships in the snapshot data, it does not adequately reflect the natural business dimensions. Rather than collapsing everything into the huge account dimension table, additional analytic dimensions such as product and branch mirror the instinctive way that banking users think about their businesses. These supplemental dimensions provide much smaller points of entry to the fact table. Thus they address both the performance and usability objectives of a dimensional model. Finally, given that the master account dimension in a big bank may approach 10 million members, we Follow type 2 slowly changing dimension (SCD) for the huge dimension into something workable process. The product and branch attributes are convenient groups of attributes to remove from the account dimension in order to cut down on the type 2 SCD effects. Later we'll squeeze the changing demographics and behavioral attributes out of the account dimension for the same reasons.
The product and branch dimensions are two separate dimensions because there is a many-to-many relationship between products and branches. They both change slowly but on different rhythms. Most important, business users think of them as basic, distinct dimensions of the banking business. Based on further study of the bank's requirements, we ultimately choose the following dimensions for our initial schema: month end date, account, household, branch, product, and status. At the intersection of these six dimensions, we take a monthly snapshot and record the primary balance and any other metrics that make sense across all products, such as interest paid, interest charged, and transaction count. Remember that account balances are just like inventory balances in that they are not additive across any measure of time. Instead, we must average the account balances by dividing the balance sum by the number of months. Product Dimension The product dimension consists of a simple product hierarchy that describes all the bank's products, including the name of the product, type, and category. The need to construct a generic product categorization in the bank is the same need that causes grocery stores to construct a generic merchandise hierarchy. The main difference between the bank and grocery store examples is that the bank also develops a large number of custom product attributes for each product type. We'll defer discussion regarding the handling of these custom attributes until the end of this chapter. The account status dimension is a useful dimension to record the condition of the account at the end of each month. The status records whether the account is active or inactive or whether a status change occurred during the month, such as a new account opening or an account closure. Rather than whipsawing the large account dimension or merely embedding a cryptic status code or abbreviation
directly in the fact table, we treat status as a full-fledged dimension with descriptive status decodes, groupings, and status reason descriptions as appropriate. In many ways we could consider the account status dimension to be another example of a minidimension. Customer Dimension Rather than focusing solely on the bank's accounts, users also want the ability to analyze the bank's relationship with an customer. They are interested in understanding the overall profile of a customer, the magnitude of the existing relationship with the customer, and what additional products should be sold to the customer., and. These demographic attributes change over time; as you might suspect, the users want to track the changes. If the bank focuses on accounts for commercial entities rather than consumers, it likely has similar requirements to identify and link corporate families. From the bank's perspective, a customer may be comprised of several accounts and individual account holders. For example, consider John and Mary Smith as a single customer household. John has a checking account, and Mary has a savings account. In addition, John and Mary have a joint checking account, credit card, and mortgage with the bank. All five of these accounts are considered to be a part of the same Smith household despite the fact that minor inconsistencies may exist in the operational name and address information. The process of relating individual accounts to households (or the commercial business equivalent of a residential household) is not to be taken lightly. House holding requires the development of business rules and algorithms to assign accounts to households. There are specialized products and services to do the matching necessary to determine household assignments. It is very common for a large financial services organization to invest significant resources in specialized capabilities to support its house holding needs.
We decide to treat them separately because of the size of the account dimension and the volatility of the account constituents within a household dimension, as referenced earlier. In a large bank, the account dimension is huge, with easily over 10 million rows that group into several million households. The customer dimension provides a somewhat smaller point of entry into the fact table without traversing a 10-million-row account dimension table. In addition, given the changing nature of the relationship between accounts and customer, we elect to use the fact table to capture the relationship rather than merely including the household attributes on each account dimension row. In this way we avoid using the type 2 SCD approach with the large account dimension. Various Dimension So far we discussed about customer and product analysis. There are other bank related things Agent, Transaction, employee. Agent analysis to be maintained to know about the agent information history wise according to there locations. To give other agents policies to the agents. Transaction to be maintained for credit account daily wise. it should have transaction information of credit complete transaction and employee information according to there location of the bank. Time Dimension So far we've restricted our discussions in this financial services chapter to monthend balance snapshots because this level of detail typically is sufficient for analysis. If required, we could supplement the monthly-grained snapshot fact table with a second fact table that provides merely the most current snapshot as of the last nightly update or perhaps is extended to provide daily-balance snapshots for the last week or month. However, what if we face the requirement to report an account's balance at any arbitrarily picked historical point in time? Creating daily-balance snapshots for a large bank over a lengthy historical time span would be overwhelming given the density of the snapshot data. If the bank
has 10 million accounts, daily snapshots translate into approximately 3.65 billion fact rows per year. Assuming that business requirements already have driven the need to make transaction detail data available for analysis, we could leverage this transaction detail to determine an arbitrary point-in-time balance. To simplify matters, we'll boil the account transaction fact table down to an extremely simple design. The transaction type key joins to a small dimension table of permissible transaction types. The transaction sequence number is a continuously increasing numeric number running for the lifetime of the account. The final flag indicates whether this is the last transaction for an account on a given day. The transaction amount is self-explanatory. The balance fact is the ending account balance following the transaction event. In a situation we are taking advantage of a special situation that exists with the surrogate date key. The date key is a set of integers running from 1 to N with a meaningful, predictable sequence. We assign consecutive integers to the date surrogate key so that we can physically partition a large fact table based on the date. This neatly segments the fact table so that we can perform discrete administrative actions on certain date ranges, such as moving archived data to offline storage or dropping and rebuilding indexes. The date dimension is the only dimension whose surrogate keys have any embedded semi-intelligence. Due to its predictable sequence, it is the only dimension on which we dare place application constraints. We used this ordering in the preceding SQL code to locate the most recent prior end-of-day transaction. Fact Overview The heterogeneous product technique just discussed is appropriate for fact tables in which a single logical row contains many product-specific facts. Snapshots usually fit this pattern.
On the other hand, transaction-grained fact tables often have a single fact that is generically the target of a particular transaction. In such cases the fact table has an associated transaction dimension that interprets the amount column. In the case of transaction-grained fact tables, we typically do not need specific line-ofbusiness fact tables. We get by with only one core fact table because there is only one fact. However, we still can have a rich set of heterogeneous products with diverse attributes. In this case we would generate the complete portfolio of custom product dimension tables and use them as appropriate, depending on the nature of the application. In a cross-product analysis, we would use the core product dimension table because it is capable of spanning any group of products. In a single-product analysis, we optionally could use the custom-product dimension table instead of the core dimension if we wanted to take advantage of the custom attributes specific to that product type.
SYSTEM DEVELOPMENT
EMP_KEY
PK/ FK
EMPNO
Lookup on EMP_BANkDIM(EMP_KEY,EMPNO)
ORACLE USERNAME
Staging Column Specifications Column Name PK Format FK Target Table Name: CUSTOMER_DETAILS
Null
Data Source Specifications Colu Pk Format Null File / Table mn / FK Name Field Source File Name: CUSTOMER_DETAILS CUSTOMER
PK
N N N Y Y
1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1 1-to-1
_DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS CUSTOMER _DETAILS
STREET_NBR STREET_NAME POSTAL_CODE CITY_NAME STATE_CODE NAME_PREFIX FIRST_NAME LAST_NAME GENDER MARITAL_STATUS
NUMBER DATE VARCHAR2(30 ) VARCHAR2(30 ) VARCHAR2(30 ) VARCHAR2(30 ) VARCHAR2(30 ) VARCHAR2(30 ) VARCHAR2(30 ) VARCHAR2(30 )
Y Y Y Y Y Y Y Y Y Y
Customer
Time
Bank Transactions
Transaction
Product
Here is the system generated schema for the Bank Product Analysis.
5.4 SCHEMA
Acct_nbr (PK) Acct_type Cust_id Acct_start_date Acct_end_date Ref_acct_nbr Empno Checking_acct Column Name Acct_nbr (PK) Acct_type Cust_id Ref_acct_nbr Empno Minimum_balance Per_check_fee Account_active Account_start_date Acct_end_date Starting_balance Ending_balance Acct_nbr (PK)
Data Type Number(3) Varchar2(2) number Number(16) Number(3) Number(9,2) Number(9,2) Varchar2(1) Date Date Number(9,2) Number(9,2) Number(3)
Acct_nbr Channel_nbr Session_id Check_nbr Tran_duration Tran_amt Principal_amt Interest_amt New_balance Tran_date Tran_time Channel Tran_code Savings_acct Column Name Acct_nbr (PK) Acct_type Cust_id Ref_acct_nbr Empno Minimum_balance Account_active Acct_start_date Acct_end_date Starting_balance Ending_balance Savings_tran Column Name
Number(16) Number Number|(9,2) Number Number Number(9,2) Number(9,2) Number(9,2) Number(9,2) Date Varchar2(6) Varchar2(1) Varchar2(2)
Data Type Number(16) Varchar2(2) Number Number(16) Number(3) Number(9,2) Varchar2(1) Date Date Number(9,2) Number(9,2)
Data Type
Tran_id (PK) Cust_id Acct_nbr Channel_nbr Session_id Tran_duration Tran_amt Principal_amt Interest_amt New_balance Tran_date Tran_time Channel Tran_code Credit_acct Column Name Acct_nbr (PK) Agent_id Cust_id Credit_limit Credit_rating Minimum_balance Account_active Acct_start_date Acct_end_date Starting_balance Ending_balance Credit_tran
Number Number Number (16) Number Number(9,2) Number Number(9,2) Number(9,2) Number(9,2) Number(9,2) Date Varchar2(6) Varchar2(1) Varchar2(2)
Data Type Number(16) number Number Number(9,2) Number Number(9,2) Varchar2(1) Date Date Number(9,2) Number(9,2)
Column Name Tran_id (PK) Cust_id Acct_nbr Channel_nbr Agent_id Session_id Tran_duration Tran_amt Principal_amt Interest_amt New_balance Tran_date Tran_time Channel Tran_code Banking_services Column Name Trans _id Acct_nbr Service Tran_amt Tran_charge Tran_tot_amt Bank_trans_source Column Name cust_id Acct_nbr
Data Type Number Number Number (16) Number Number(9,2) Number(9,2) Number Number(9,2) Number(9,2) Number(9,2) Number(9,2) Date Varchar2(6) Varchar2(1) Varchar2(2)