Professional Documents
Culture Documents
This document contains Confidential, Proprietary and Trade Secret Information (Confidential Information) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software productas well as the timing of any such release or upgradeis at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published July 2006
White Paper
Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The Business Challenges of Mainframe Migration . . . . . . . . . . . . . . . . . . .3 The Technical Challenges of Mainframe Migration . . . . . . . . . . . . . . . . . . .4 The Seven Approaches to Mainframe Migration . . . . . . . . . . . . . . . . . . . . .4 Data Migration Project Challenges, Methodologies, and Tools . . . . . . . . . .6
Mainframe Data Migration Project Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Data Migration Methodologies and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
Executive Summary
For more than 40 years companies have deployed mission-critical business applications on the mainframe. Many of these applications have been built for both non-relational database management systems (DBMSs), as well as for relational sources of data on the mainframe, such as DB2. Yet recently, according to Gartner, the installed base for pre-relational database management systems has been declining. The useful life of pre-relational mainframe database management system engines is coming to an end because of a diminishing application and skills base, and increasing costs. Although the installed base for pre-relational DBMSs is shrinking, the market share numbers from Gartner Dataquestshow that the revenue is increasing. This is due primarily to increased prices from the vendors, currency conversions and mainframe CPU replacement. In real numbers, the revenue is dropping as the number of customers and licenses decreases.1 Many companies have migrated mission-critical applications off the mainframe onto open standard relational database management systems (RDBMS) like Oracle for a variety reasons limited application support from independent software vendors (ISVs) and a shrinking resource base, for example. Regardless of the reasons why your business has elected to move off mainframe, once the decision has been made, your IT organization needs to know about the approaches, techniques, and tools to successfully migrate to a more modern application landscape or open standard RDBMS like Oracle. This is where this white paper can help. This white paper examines both the business and technical challenges of migrating off the mainframe. It outlines the seven mainframe migration approaches that IT organizations can use to develop their migration strategies. It explores common data migration project methodologies and tools, and suggests ways to convert a serial approach to migration into a more effective, iterative process. Finally, this white paper describes how IT organizations can use Informatica enterprise data integration software to effectively migrate off the mainframe to more modernized systems. While most of the practices discussed in this paper apply to migrations from mainframe-based legacy applications to any relational database management system, this paper focuses specifically on migrations to Oracle environments.
1 Mattern, Thomas and Matthias Haendly. "ESA: A 2005 'Business-Savvy' Take on SOAs," Integration Developer News, February 9, 2005.
White Paper
White Paper
1. One-Time Bulk Offload. This approach is typically used for testing and staging environments. Data is migrated off the mainframe as a one-time data movement event, often completed during a lean period like a weekend or the early morning hours. This approach requires considerable advance planning, especially since the entire bulk data load has to be moved within a short window of time. Typically, this approach is used for the initial data load for testing the migration, and after all the testing has been done and before moving the system to production.
2. Incremental Delta Offload. In this approach, data is migrated off the mainframe in batches. After the initial data movement, the goal is to bring over changes made to the mainframe system data on a periodic basis (e.g., daily, weekly, monthly). The challenge of this approach is to identify the changes made on the mainframe and selectively extract just the changes.
3. Bi-directional Replication Synchonization. In this approach, two production systems the mainframe system and the Oracle system run in parallel with data on each system and replicate data on the other. The challenge of this approach is to support both batch and realtime bi-directional integration, since in many cases, both systems will be running for years before the mainframe is shut down. Its important to note that in this scenario, business decisions would have to be made well in advance of implementing this replication scenario to determine the master/slave relationship in this bi-directional transaction. Otherwise, unpredictable updating could occur to either the source or target system, or both.
4. Physical Federation. This approach involves multiple data sources (e.g., VSAM, DB2, IMS, Datacom, ADABAS, etc.), which must be read and joined to produce a single view of the data inside an Oracle RDBMS. Data is still stored in the respective mainframe data stores, but the Oracle system becomes the single version of the truth. This is a popular approach when the packaged application replacing the mainframe may not have all the functionality of the mainframe, or when parts of the mainframe system are so complicated that they cannot be replaced for years to come. This approach facilitates phased migration of mission-critical infrastructure. Companies keep their mainframe systems, but by pulling the data into an Oracle system, they put in place a service-oriented architecture (SOA) for integration with the rest of the enterprise.
5. Virtual Federation. This approach is identical to the physical federation model, but instead of loading all the data into an Oracle RDBMS, the data from all the mainframe sources is joined virtually to provide a just in time single view to the consuming applications or users. This approach is sometimes called Enterprise Information Integration (EII).
6. Oracle Transactions on Mainframe. In this approach, the Oracle system becomes the primary system of record for business process execution, but some critical business functionality still resides on the mainframe. New transactions are first processed on the Oracle system, and then related mainframe system updates are executed by initiating batch jobs, or by such on-line transaction systems as Customer Information Control Systems (CICS) or Information Management System/Transaction Manager (IMS/TM), formerly known as IMS/Data Communications (IMS/DC).
7. Mainframe Transactions on Oracle. In this approach, functionality moves slowly to the Oracle system, but the mainframe still remains the primary system of record for business process execution. Since functionality and data have been moved to the Oracle system, there are still CICS, IMS, and/or batch mainframe transactions that need to access data in the Oracle database.
White Paper
software implementation projects fail or overrun their budgets and schedules. Of the projects that are overrun, half exceed timescales by 75 percent and two-thirds exceed the overall project budgets. A major reason why these failure rates are so high is because data migration is considered a minor, one-time event during the overall implementation. Migration is not an industry-recognized area of expertise with an established body of knowledge and practices, nor have most companies built up any internal competency from which to draw. Organizations need to understand the unique challenges of migration projects and adopt an appropriate migration methodology to address and overcome these challenges
In summary, during the upfront analysis of the source mainframe data, most of the assumptions about the data are proved wrong. Since sufficient time is rarely planned or allocated for analysis, any mapping specification from the mainframe to Oracle is hardly more than an intelligent guess. Based on the initial mapping specification, extractions and transformations run into changing target data requirements, requiring additional analysis and changes to the mapping specification. Validating the data according to various integrity and quality constraints also typically poses a challenge. If the validation fails, the project goes back to further analysis and then additional rounds of extractions and transformations. When the data is finally ready to be loaded into the Oracle system, unexpected data scenarios often break the loading process and send the project back for more analysis, more extractions and transformations, and more validations.
Analyze
Extract/Transform
Load
Validate/Cleanse
Figure 1: The Data Migration Methodology Should Be Converted from a Serial Process into an Iterative Proces
White Paper
This iterative approach to data migration is best achieved by using a single, unified toolset or platform that leverages automation and provides functionality that spans all four stages. In an iterative process, there is a big difference between using four different tools for each stage and one unified toolset across all four stages. When IT organizations use one unified toolset, the results of one stage can be easily carried into the next, enabling faster, more frequent and ultimately fewer iterationsa key to success in a migration project. A single platform not only unifies the development team across the project phases, but also unifies the separate teams that may be handling each different source system in a multi-source migration project.
applications or sample data extracts, to analysis via custom-coded reports or elaborate and intertwined spreadsheets. These data profiling methods typically sample data in a few key fields to get a sense of what the data is like in these columns, but the results are often inaccurate and incomplete. An inadequate toolset and manual approach to profiling often leads to a data migration project which underestimates the scope, schedule, and resources required to properly analyze source data systems. Figure 2 shows how a much more even distribution of project resources over the key project phases (e.g., analysis, build, and test) can promote savings. Relying on the build or development phase to identify and fix data issues can increase the cost by ten times.
Analysis 10%
Test 30%
Test 30%
Build 60%
Figure 2: Proactive Analysis of Source Data Saves Both Time and Money
PowerCenters data profiling capabilities provide comprehensive, accurate information about the content, quality, and structure of data in virtually any operational system. Organizations can automatically assess the initial and ongoing quality of data regardless of its location or type. With its comprehensive data profiling capabilities, PowerCenter: Reduces data quality assessment time with easy-to-use wizards and pre-built metric-driven reports that comprise a single interface for the entire profiling process Addresses ongoing data quality in legacy applications with Web-based dashboards and reports that illustrate changes in data content, quality, structure, and values over time Ensures end user data confidence by automatically and accurately profiling any data accessible to PowerCentervirtually any and all enterprise data formats Figure 3 shows an example of a PowerCenter data profiling report. The report shows how PowerCenter automatically infers the primary and foreign key relationships across three tables in a legacy application. Its important to note that PowerCenter data profiling can profile any data source that PowerCenter can natively access, including mainframe tables.
PowerCenters data profiling reports help migration teams determine if the legacy data has quality issues and how to properly address them.
Figure 3: PowerCenter Profiling Report Infers Primary Key and Foreign Key Relationships between Multiple Legacy Application Tables
10
White Paper
PowerCenters data profiling capabilities help migration teams to do much more thorough analysis than manual profiling of the legacy systems. The platform provides the tools to automatically scan all records across all columns and tables in a source system and dynamically generate reports that make it easy to understand the true state of the data. These reports help the migration teams help migration teams determine if the legacy data has quality issues and how to properly address them. Data profiling is important both before (i.e., upfront source system profiling) and after (profiling the converted data for the Oracle application environment) migration. PowerCenters capabilities enable the profiling of data pre- and post-migration, validating the readiness of the mainframe data for Oracle.
Data Sources
Relational databases Flat files Mainframe/legacy systems Packaged application Replication or change data capture utilities EAI/messaging software Web XML Other 4% 15% 12% 15% 15% 39% 65% 81% 89%
20
40
60
80
100
11
Unstructured Data
PDF Word Excel Vertical Standards (e.g., HL7, SWIFT, ACORD) Print Stream BLOBs Any proprietary data format/standard
Informatica PowerCenter
The flexibility to access all types of enterprise data in a single data integration platform offers significant advantages over hand-coded data migration approaches, including: Increased productivity. With the ability to centralize data access and management, PowerCenter frees data migration teams from having to maintain and be dependent on a cumbersome, time-consuming process where programs are developed to extract and stage data for each source of legacy data. Reduced risk. Sources of data for Oracle DBMS implementations tend to be dynamic. Extracting data from a client/server-based legacy application today does not insulate the team from future requirementsfor example, having to migrate over mainframe and mid-range applications from applications resulting from a corporate merger or acquisition. PowerCenter reduces the risk of both current and future data migration efforts by providing access to a broad range of enterprise data formats.
12
White Paper
Legacy
PowerExchange
Real Time
Target
Mainframe
Oracle
Change
Batch
Figure 6: PowerExchange Accesses Mainframe and Provides a Choice of Latency to Deliver Data When Needed
13
Built-In Data Transformation and Correction Capabilities to Address Data Quality in Legacy Applications
The Informatica product suite helps data migration teams by enabling the team to focus on the data and not code. PowerCenter provides a single, unified, scalable enterprise data integration platform with a robust library of transformation and data services capable of handling all data conversion on any mainframe data migration project. By leveraging PowerCenters codeless and wizard-driven approach for Oracle data conversion, teams can focus more on the business rules and data, and less on the code.
Single, Unified, Metadata-Based Data Integration Platform to Support the Data Migration Lifecycle
When data migrations projects are driven by teams that are focused exclusively on the target system, not in the end-to-end data migration process, a common outcome is the code, load, and explode phenomenon. This occurs when developers code the extraction and conversion logic thought to be required for migration, then attempt to load it to the target business application, only to discover an unacceptably large number of errors due to unanticipated values in the source data files. They fix the errors and rerun the conversion process, only to find more errors, and so on. This ugly scenario repeats itself until the project deadlines and budgets become imperiled and angry business sponsors halt the project. PowerCenter breaks this code, load, and explode cycle. PowerCenter provides all the capabilities that are essential to support the data migration lifecycle from a single, unified
14
White Paper
platform based on a metadata-driven architecture. Figure 7 shows the flow and transformation of data, using PowerCenter, from the mainframe to an Oracle system.
Figure 7: PowerCenter Lineage Diagram Demonstrates the Flow and Transformation of Data From the Mainframe to Oracle RDBMS
The foundation for all of PowerCenters data integration components is the shared metadata. When changes are made anywhere in the profiling, data access, data conversion, or loading process, PowerCenter enables immediate visibility into those changes. With its metadata-driven architecture, PowerCenter promotes faster and more flexible iterations in the data migration lifecycle. Figure 8 shows how PowerCenter is used for migrating data.
5
FIREWALL
Reusability/Team Productivity
Analyze/ Profile
Extract/ Transform
6
Validate/ Lead
Packaged Applications
Iterate
Target Application
9
Synchronize
10
Audit/Lineage
PowerCenters metadata management capabilities provide visibility across the entire data migration processfrom sourcing legacy applications and cleansing the legacy data, to preparing it in the format required for upload into an Oracle DBMS. PowerCenter enables data lineage problems to be traced at a metadata level.
15
PowerCenter helps data migration teams trace and prove how data has been converted and moved. The enhanced data visibility and tracking helps organizations comply with reporting requirements. These capabilities also help with user adoption, instilling new Oracle application users with confidence that legacy application data has in fact been converted and moved from the mainframe. Furthermore, PowerCenter alleviates the politics associated with data migration projects. Data migration activities, whether related to legacy mainframe applications or the target Oracle application, can be centralized within a single, unified data integration platform. This promotes effective and productive communication between legacy mainframe and Oracle resources, and between technical and functional resources.
16
White Paper
17
Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com
Informatica Offices Around The Globe: Australia Belgium Canada China France Germany Japan Korea the Netherlands Singapore Switzerland United Kingdom USA
2006 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and, PowerCenter are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be tradenames or trademarks of their respective owners.