Professional Documents
Culture Documents
1. Talend Architecture
2. Talend DI Open Studio Introduction
3. Creating Repository, Project and a Job
4. Talend DI Basic Components
5. Parallelism
6. Error and Logs Handling
7. Scheduling and Execution
8. Version management
9. Case Studies
10. Queries
2
Course Objective
Hands On Experience
3
Talend Overview
4
Talend Products
Talend Open Studio
6
Talend Architecture & Supporting Platforms
Talend Architecture Components
8
Talend Architecture Components
9
Talend Supporting Platforms
Talend Data Integration supports the following third party components, products
and operating systems. Support varies across products.
12
Intro - Talend DI Open Studio
ETL (Extract/Transform/Load)
Data migration
13
Important Concepts
14
GUI
15
GUI Contd
16
Business Model
17
Business Model- An example
18
Job Design
19
Context Management
20
Selecting the execution context
When Talend Open Studio starts:
During deployment:
A retenir :
Le context est inclus dans le code gnr et ne peut pas tre modifi une fois le job dploy.
Lors de lexport du job, on peut choisir le context dexcution du job et des sous-jobs lancs par un
tRunJob 21
Basic Components
Processing Components
File Components
Misc Components
22
Big Data Components
23
Database Components
24
Oracle Components
25
Processing Components
26
Misc Components
27
File Components
28
Log & Errors
29
Job Creation Steps
1. Setup/Create Project
Locate the Talend Open Studio and Start the Talend Open
Studio for Data
Integration (TOS_DI-win-x86_64.exe) and double click it to
execute and open the Talend studio.
31
2. Open Project
32
3. Create Job
33
4. Provide Job Name
34
5. Add Component
35
6. Setting Properties
Basic Settings tab
36
7. Set Schema
37
8. Advanced Setting Tab
38
8. Advanced Setting Tab
39
9. Add Transform Component
40
10. Map Editor
The Map Editor is made of several panels:
Input panel is the top left panel on the editor. It offers a graphical
representation of all (main and lookup) incoming data flows. The data are gathered
in various columns of input tables.
Variable panel is the central panel in the Map Editor. It allows the centralization
of redundant information through the mapping to variable and allows to carry out
transformations.
Output panel is the top right panel on the editor. It allows mapping data and
fields from Input tables and Variables to the appropriate Output rows.
Schema editor is at the bottom panels and offers a schema view of all columns
of input and output tables.
Expression editor is the edition tool for all expression keys of Input/Output
data, variable expressions or filtering conditions.
41
10. Map Editor Contd
42
10. Map Editor Contd
Expression editor
43
11. Connection Types
A Job or a sub job is composed of a group of components logically
linked to one another via connections
Main
Rejects
Gathers the data that does NOT match the filter or are not valid
for the expected output
Lookup
OnComponentError
OnSubjobOK
Tigger the next subjob on the condition that the main subjob
completed without error.
OnSubjobError
Trigger the next subjob in case the first (main) subjob do not
complete correctly.
45
11. Connection Types-Example
46
12. Add Output Component
tFileOutputDelimited component writes rows to a file in
delimited format
47
13. Run a Job
48
14. Build Job
49
14. Build Job Contd
50
14. Build Job Contd
51
15. Build Job-Jar/sh/batch file
52
Job Designer: tMap and Lookup
CUSTOMER
WITH STATES
53
Database Connection-Step 1
54
Database Connection-Step 2
55
Database Connection-Step 3
Test Connection
56
Database Connection Contd
57
Database Connection Contd
58
Joins and Transformation
Example
Example
60
Joins and Transformation
Example Contd
Id;Title;FirstName;LastName;AddressId;Street;Town;County;Postcode
61
Routines
Routines
They are fairly complex Java functions, generally used to factorize
code. They therefore optimize data processing and improve Job
capacities.
Types of Routines:
System Routines : Classed according to the type of data which
theyprocess like numerical, string, date, etc
User Routines : These are routines which user creates or adapt
from existing routines.
63
System Routines
Numeric Routines :
The Numeric category contains several routines, notably
sequence, random and decimal (convertImpliedDecimalFormat)
Relational Routines :
Allows to check affirmations based on Boolean
64
System Routines Contd
TalendString Routines
65
User Routines
66
Parallelism
Job Parallelization
Talend allows to run SubJobs in parallel, also known as Multi-
threading in two ways.
68
Job Parallelization Contd
69
Error and Log Handling
Error and Log Components
The Logs & Errors family groups together the components which
are dedicated to log information catching and Job error
handling.
Example: tAssert, tAssertCather, tlogRow etc
71
Error and Log handling
72
Management of Logs/Preferences
73
Management of
Logs/Preferences Contd
In the Properties and Job Designs view, preferences are entered
automatically:
75
Customizing error logs
76
tLogCatcher schema
Default schema
77
tStatCatcher
78
Scheduling and Execution
Schedule job using Crontab
sh ./JoinExample_run.sh
crontab -e
80
Scheduling
(Within Talend DI Open Studio)
The Scheduler view in DI helps to schedule a task that will launch
periodically a Job via a task scheduling (crontab) program.
81
Scheduling Contd
(Within Talend DI Open Studio)
82
Job Execution & Scheduling
(Outside of Talend DI Open Studio)
Talend Open Studio (TOS) allows export jobs in a number of
export types.
Autonomous Job
Axis WebService (WAR)
AxisWebService (ZIP)
JBoss ESB
Petals ESB
OSGI Bundle For ESB
83
Version management
Version Management
When a Job is created in Talend Studio, by default its version
is 0.1, where 0 stands for the major version and 1 for the
minor version.
86
Version Management Contd
87
Version Management Contd
88
Version Management Contd
89
Case Studies
Case Study
91
Queries???
92
Thank You!!!