Thursday, October 31, 2013

How to run Jobs remotely on TIS?

If you have JobServer installed on your box, then running Jobs remotely involves two steps.

1) Configure your TIS to point to the remote Job Server
2) Run the Talend Job fro the studio on the remote Job Server.

Once you configure your remote job server in the TIS preferences, you can run the job remotely by configuring the Target Exec tab of "Run" Talend Job.







































I do not have Job Server installed. But if you do, you could run the Talend Job from your local studio on a remote server. Very handy!

Happy Talending!
Praveena







How to capture Talend Job Activity into databases - Activity Monitoring Console

It is often useful to capture the statistics of a job to monitor the job performance and error codes. Talend has three ways to capture this information.

  1. On console
  2. In FileSystem
  3. On Databases
My personal choice is to capture the logging information in a database since it will be very useful to store history and easy to track the trends of the job activity with a query. This could be very handy to identify any performance bottlenecks.

So here are the table definitions for the logs, stats and volume catchers used by the AMC. You can easily get these definitions by exporting the meta-data from tlogcatcher, tflowmeter and tstatcatcher to Generic Schemas in the repository and reusing them later. Here they are served without going through the pain of creating the metadata again :)

Create these tables on your database and configure your project to capture this information in the database so that the AMC can retrieve this info from database and display it for you.


CREATE TABLE [dbo].[logCatcher](
[moment] [datetime] NULL,
[pid] [varchar](20) NULL,
[root_pid] [varchar](20) NULL,
[father_pid] [varchar](20) NULL,
[project] [varchar](50) NULL,
[job] [varchar](255) NULL,
[context] [varchar](50) NULL,
[priority] [int] NULL,
[type] [varchar](255) NULL,
[origin] [varchar](255) NULL,
[message] [varchar](255) NULL,
[code] [int] NULL
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[flowMeter](
[moment] [datetime] NULL,
[pid] [varchar](20) NULL,
[father_pid] [varchar](20) NULL,
[root_pid] [varchar](20) NULL,
[system_pid] [bigint] NULL,
[project] [varchar](50) NULL,
[job] [varchar](50) NULL,
[job_repository_id] [varchar](255) NULL,
[job_version] [varchar](255) NULL,
[context] [varchar](50) NULL,
[origin] [varchar](255) NULL,
[label] [varchar](255) NULL,
[count] [int] NULL,
[reference] [int] NULL,
[thresholds] [varchar](255) NULL
) ON [PRIMARY]

GO


CREATE TABLE [dbo].[statCatcher](
[moment] [datetime] NULL,
[pid] [varchar](20) NULL,
[father_pid] [varchar](20) NULL,
[root_pid] [varchar](20) NULL,
[system_pid] [bigint] NULL,
[project] [varchar](50) NULL,
[job] [varchar](50) NULL,
[job_repository_id] [varchar](255) NULL,
[job_version] [varchar](255) NULL,
[context] [varchar](50) NULL,
[origin] [varchar](255) NULL,
[message_type] [varchar](255) NULL,
[message] [varchar](255) NULL,
[duration] [bigint] NULL
) ON [PRIMARY]

GO

These table definitions are for SQL Server.

Happy Talending!
Praveena

Diff Tool in Talend - Compare Jobs

Often times, we need to do a comparison between different versions of Talend Jobs. It could be for troubleshooting purposes or for logging the "Release Notes" for developers to help with maintenance.
A very useful feature for diffing purposes is "Talend's Compare Job" available only in TIS. This could save lot of time for developers to understand what has changed (especially with minor changes) without even opening the job.

Screenshots follow.




































The save to HTML file is very useful when we are trying to log all the changes in a release by running the diff on the DELTA of files that have been modified in a release.

Happy Talending!
Praveena


Talend Migration from 4.X to 5.X

Recently, we started working on Talend Migration from 4.X to 5.X. One biggest surprise we encountered was, the Talend migration of jobs happens automatically without a message for confirmation.

Something in lines of "The migration is about to start. Do you want to proceed, Yes/No"

This means that if you opened a Talend 4.X job in TIS/TOS 5.X studio, the job is automatically migrated to 5.X. Since, we had an unplanned/accidental migration, we had to revert the code back to 4.X in sub-version. But we had several issues when reverting the code back. This post is to share our experiences about the migration.

Cannot find Talend.Project error. Classic error!! If you look in your subversion, you would see the talend.project file but the studio complains that it cannot find it. We still do not know which one action solved this error but doing the following helped us resolve the error.


  1. Release all the locks in Talend using TAC.
  2. Make sure that the project url to subversion is reachable.
  3. Make sure that the account used to add a project in subversion has sub-version access privileges to the project.
  4. Make sure that the project references has the authorization set up properly.
  5. Make sure that the subversion url is valid and the credentials are good.
  6. Make sure each user in the TAC has an svn login configured.
Cannot give a detailed screenshots for this exercise but let me know if you run into a similar issue during migration. Would be happy to help!

Error - Talend.Project doesn't exist


Happy Talending!
Praveena



Tips to interact with Databases efficiently using Talend

TIPS to interact efficiently with Databases using Talend

  1. In the Database Output component choose the "Enable Parallel Execution" (Last option in the Advance Settings)
  2. Use a reasonable batch-size (The default is 10,000 rows)
  3. Use the ELT components and Bulk Execute components where ever applicable.
  4. Choose only the columns you are interested in your data-sets. Avoid using select * from in your queries.
  5. Choose only those rows that you are interested in by limiting them with a WHERE clause
  6. Do not perform blocking operations like "Aggregate/Sorting" on the Talend. The DB is more efficient in doing these operations. For Aggregation/Sorting, the entire dataset has to be loaded before the operation is performed. So if the data-set is huge, it could choke the memory and cpu.
  7. Understand the "Commit" options. Commiting frequently has pros and cons


    •  Example - You have 20000 rows to be written to the database. Lets assume that we are commiting for every 5000 rows. If there is any error after the second commit (meaning after 5000 + 5000 = 10000 rows), our database will be in a state where rows have been written partially. Unless your code is re-runnable, you could end up in a situation where you need to first clean up before you start again..
    • Commiting frequently avoids huge log files on the database.
    • Commiting only at the end could have an impact on the memory usage if it is a large batch of rows that you are waiting to commit.

How to import an external library to Talend Jobs?

You can import an external library into Talend Jobs in two ways.

  1. using tLibraryLoad component (Available to only specific job using this component)
  2. Adding the external library to the Talend's Routine Library. (Available to all the jobs)

Examples follow.
Adding the external library to the Talend's Routine Library. (Available to all the jobs)



Adding the external library to the Talend's Routine Library

Happy Talending!
Praveena

Parallelization in Talend

You can achieve Parallelization in Talend in 2 ways.

  1. Running SubJobs in Parallel by using the Multi-threaded Executions
    • Enabling Mulit-threaded Execution is hidden in the Jobs view of the studio.
    • Also note that enabling multi-thread on a single processor could hurt the performance
  2. Using the tParallelize component of Talend.
    • The tParallelize component is an Orchestration component.
Screenshots of a simple sample Job running as a single thread, multi-thread and with tParallize are shown below.

 Job running as a single thread

Job running as a single thread
Using Multi-threaded Execution





Multi-threaded Execution


Using the tParallelize component of Talend
Using TParallelize component of Talend








Monday, October 21, 2013

DataViewer - a useful feature to preview data

Many a times, we might be maintaining others code and would like to see what a component is retrieving. The DataViewer is a gem hidden in the context menu that could be quite useful to preview data.

Right Click on the DbComponent will give us the DataViewer for previewing data. 

Happy Talending!
Praveena

Friday, October 18, 2013

Talend tWarn component Gotcha


tWarn component could provide useful information like the project name, Job name, and any columns of data in addition to the message to be displayed. This kind of information can be useful to raise an alert that the Job  is running successfully but with a condition that might indicate an issue. This is can be very useful for troubleshooting or maintenance purposes.

The big GOTCHA here is that without the  tLogCatcher component, the warning message is available as context data but does not automatically appear in the console. So if you include tWarns in your jobs, be sure to include tlogCatchers to catch the warning messages.

PS - CTRL + SPACE is for template proposals in Talend. So whenever you want to see what context variables are available for you to choose or any globalMap variables, try CTRL+SPACE.


Wednesday, October 16, 2013

Steps to install/upgrade Talend license

The following are the steps to upgrade the Talend license on a Talend Administration Console (TAC) server:

  1.   Copy the licence file to /usr/local/talend-cmdline/
  2.    Restart the Commandline application:  (kill the running process and then run:   “nohup ./commandline.sh &”)
  3. Open TAC and log in.  

  4. Click on “License” in the left pane:


Click Browse in the center pane:
   Locate and select the “license” file

 Click “Upload”:

The license should now be installed and the TAC will no longer display the “License will expire” message.

Happy Talending!