You are on page 1of 7

Design InfoSphere DataStage jobs

for optimum lineage


ii Design InfoSphere DataStage jobs for optimum lineage
Contents
Design InfoSphere DataStage jobs for
optimum lineage . . . . . . . . . . . 1

iii
iv Design InfoSphere DataStage jobs for optimum lineage
Design InfoSphere DataStage jobs for optimum lineage
Design your IBM InfoSphere DataStage jobs to ensure that complete metadata is
available for lineage reports in IBM InfoSphere Metadata Workbench.

When an IBM InfoSphere DataStage and QualityStage job is developed,


information that is included in the job is called design metadata. When you design a
job, you build the data flow from a source of the job to a target in the job.

IBM InfoSphere Metadata Workbench uses design metadata to build lineage


reports that analyze the flow of data from source to target. The lineage analysis
makes relationships and links between job assets and stages. In addition,
InfoSphere Metadata Workbench uses the design metadata to identify the sources
that the job stages read from or write to. This metadata includes the following
information: name of the database server or the data connection, name of the
database schema, any user-defined SQL statements, or name and location of the
data file.

Information that flows across InfoSphere DataStage and QualityStage jobs is called
design lineage. The data output of one job can be the data source of another job. In
this case, the data source is shared between the two jobs. If a source of the job is
not imported into the metadata repository, the design lineage metadata is used to
infer the relationship with other jobs. This relationship is based on the shared
usage of the referenced data source.

Use the following table of actions to ensure that your job design gives complete
metadata for best lineage results.
Table 1. Actions to ensure complete job design metadata for data lineage
How this action affects
Action Description lineage Additional information
Use Connector Connector stages give the The Manage Lineage utility For a list of job stages with
stages maximum amount of reads the design lineage their description, see
metadata about the job metadata from the stages of Alphabetical list of stages.
design. Therefore, use the job. The Manage Whether a particular stage
Connector stages instead of Lineage utility then infers is displayed on the
equivalent generic stages. the database or data file InfoSphere DataStage
For example, use the assets that the job reads Designer client palette
ODBC Connector stage from or writes to. depends on the type of job
rather than the ODBC Connector stages provide and the installed products
Enterprise stage. more information to and add-ons.
enhance the utility.

1
Table 1. Actions to ensure complete job design metadata for data lineage (continued)
How this action affects
Action Description lineage Additional information
Use You can define variables The use of variables For more information
environment and parameters to reuse reduces error and promotes about how to set up job
variables and across all jobs of a project data reuse in job parameters and parameter
job parameters by using environment development. sets, see Making your jobs
variables and job adaptable.
parameters. Wherever
possible, use parameters For general information
and parameter sets as about setting environment
common references across variables, see Guide to
all jobs in a project. setting environment
variables.

For general information


about environment
variables, see Environment
variables.
Import Before you run lineage InfoSphere Metadata For information about how
project-level reports, you must import Workbench uses the to import environment
environment the project-level environment variables to variables, see Import
variables environment variables that reconcile and link the job project-level environment
you defined in InfoSphere with referenced sources. variables.
DataStage into InfoSphere
Metadata Workbench.
Check the To list the environment For information about how
project-level variables that are defined to run this utility, see
environment for the project, use the Listing environment
variables dsadmin utility. variables.
Load columns Table definitions carry InfoSphere Metadata For more information
of database information about your Workbench requires table about shared metadata in
and file stages source and target data, and column definitions to InfoSphere DataStage, see
from shared such as the name and match imported database Shared metadata.
metadata structure of the database assets to jobs and to other
tables or files that contain assets in the metadata
your data. Within a table repository.
definition are column
definitions. Column
definitions contain
information about the
column name, column
length, data type, and
other column properties,
such as keys and null
values.
When you The name and directory If the name or directory
import a data path of the imported or path is not the same as it is
file, ensure shared data file must in the stage, the data file
that the its match the name and and stage cannot be linked
name and directory path in the stage. correctly in the job data
directory path flow. As a result, the
are defined in lineage is incorrect or
the same way incomplete.
that they are
defined in the
stage
Use job To minimize errors, use job For information about job
parameters to parameters wherever parameters, see Job
define file possible. parameters.
names and
directory
paths

2 Design InfoSphere DataStage jobs for optimum lineage


Table 1. Actions to ensure complete job design metadata for data lineage (continued)
How this action affects
Action Description lineage Additional information
Use the In InfoSphere Metadata The Manage Lineage utility For information about
default SQL Workbench, the schema parses all SQL statements user-defined SQL in
statements and database table name of to extract information about InfoSphere DataStage, see
rather than the imported database the schema, owner, User-defined SQL.
user-defined must be the same as the database tables, and
SQL schema and table name in columns. The utility then For information about job
the stage. You can generate maps this information to design considerations and
default SQL statements to shared database tables that SQL, see Job design
read from and write to were previously imported. considerations.
data sources. Alternatively, User-defined SQL that
you can define SQL contains complex
statements that read from statements might not be
and write to data sources. parsed correctly. If
statements are not parsed
correctly, you must run the
Manual Binding utility. This
utility manually sets the
relationships between
stages and data sources and
between stages and other
stages.
Set up a You can view the log For information about log
logging view information in the IBM views and their
and review InfoSphere Information configuration in InfoSphere
the metadata Server Web console. Metadata Workbench, see
workbench Log messages, Creating
logs logging configurations, and
Creating log views.
Query On the Discover tab, you For general information
InfoSphere can run the Job Design about queries, see Queries.
DataStage jobs Usage published query to
in InfoSphere see the links between jobs For information about
Metadata and their sources. You can creating queries, see
Workbench also construct your own Creating queries.
queries to see the stage
types of a project.

After you complete these actions, you are ready to set up InfoSphere Metadata
Workbench to analyze metadata for lineage. Follow these steps:
1. Run the Manage Lineage utility.
This utility automatically runs the Manual Binding and Map Database Alias
utilities.
2. To identify schemas that are identical, run the Data Source Identity utility.
If two schemas are identified as identical, the database tables and database
columns contained by the schemas are also marked as identical when their
names match. This might be necessary when the same data source is imported
into the repository by different means, such as by a connector and a bridge.
3. Run the data lineage report.
The data lineage report shows the movement of data within a job or through
multiple jobs. The report can also show the order of activities in a run of a job.

Design InfoSphere DataStage jobs for optimum lineage 3

You might also like