Friday, October 14, 2011

How to execute multiple sql statements in Talend



Firstly, Why would you want to execute multiple SQL statements in a single Component?
  1. Ease of use and cleaner code. If all you are doing is a series of SQL statements, then you might be better off with a single SQLRow component than a train of Talend components making your job look messy. For Ex - If you are truncating the staging tables in a Data Warehouse environment, then a single Talend component is convenient
  2. It might be better if you create a store procedure for this purpose but sometimes, you might not have create-store-procedure-privilege on your schema. Then executing a series of SQL statements in SQLRow component by changing the "Additional JDBC parameters" setting as shown above might be the right thing to do.
We need to supply additional JDBC parameters because MYSQL jdbc driver, by default permits running only one query per jdbc connection and terminates after finding the first semi-colon (;). Above is a screenshot of how to modify the jdbc parameters.

Happy Talending!
-Praveena

Friday, October 7, 2011

Talend ELT Components

Purpose of the ELT Talend Components :

The ELT Talend components (Input, Mapper and Output) are useful for bulk loading the tables by pushing the processing overhead to the Database server rather than iterating through the resultsets in memory.

Example ELT Components for MYSQL. 

The three components (input, map, output) are closely tied and have to be used together.
  1. tELTMySqlInput
  2. tELTMySqlMap
  3. tELTMySqlOutput
 What can you do with ELT Components :
  1. The main purpose of the ELT components is PERFORMANCE.
  2. Bulk read/insert into tables by using different tables and all types of JOINS, WHERE clauses and GROUP BY and HAVING clauses.
What can you NOT do with ELT Components :
  1. You cannot JOIN tables from different SCHEMAS which means that the tables should be sourced from the same database schema.
Things to keep in mind when using the ELT_Mapping component

It always helps to check the SQL generated for us in the "Generated SQL" tab of editor of the ELT_Mapping component for syntax errors.

Happy Talending!

-Praveena



Wednesday, October 5, 2011

Talend GOTCHA when connecting SubJobs

Novice Talend users must pay special attention when you are connecting sub-jobs. You might want to connect the SubJobs with "On Subjob OK", "On Component OK" but might accidentally select "iterate" option from "ROW" instead of "On Subjob OK" from "TRIGGER".

A picture could perhaps clarify the above blah-blah.



Happy Talending!
-Praveena

Tuesday, October 4, 2011

Commenting GOTCHA that a Talend Novice should keep in mind

COMMENTING in QUERY components of Talend can be a pain as the error messages that you get due to commenting in SQL style will not be intuitive.

JAVA commenting style -
// For single line commenting
/* .... */ Multi line commenting

SQL commenting style-
-- For single line commenting.
/* .... */ Multi line commenting

The QUERY in a query component like MySqlInput is JAVA string behind the scenes. So commenting using the hiphens(--) in SQL style will result in an error. So it is wise to choose /* */ multi line JAVA/SQL style comments  to be safe than sorry.

Happy Talending!
-Praveena











Tuesday, September 13, 2011

Subversion error - Working copy of a particular job is locked

If you run into “subversion error.. working copy of /jobName is locked"..

You can unlock a job in two ways.

  1. Execute the following command to unlock your job. (My TIS 4.2 is on Ubuntu)

    find /TalendWorkspaces/ -name lock |xargs rm
  2. Open the TIS/TOS
    go to Navigator--->your project name folder--> Delete the two files of old version job: jobName_oldVersion.item, jobName_oldVersion.properties, then return back to repository viewer and click on 'refresh' button.
( If you cannot find Navigator, you can view it by going to Window--> Show view --> General --> Navigator. )

Happy Talending!!

java.lang.RuntimeException: For update, Schema must have a key


This is a famous Talend runtime Exception that a novice could run into several times especially when using an DBoutput component and updating the rows. Talend Forums are not quite helpful in resolving this issue. The fix is very straightforward but very unintuitive to know about it in the first attempt.

The fix involves checking the "Update-Key" found in the Advanced Settings of the Component view.

Above is the screenshot of how to fix the Exception. Go to the "Component" tab --> "Advanced Settings" --> Update Key

Check the appropriate column in the check boxes under the update key heading. You might have to scroll down the page to find this setting.

How to install MYSQL on ubuntu

Open a terminal window, and use the following command:

sudo apt-get install mysql-server

If you are running PHP you will also need to install the php module for mysql 5:

sudo apt-get install php5-mysql

Thursday, September 8, 2011

Introduction to Talend

Talend is an open source software for Data Integration that is quickly becoming an alternative to expensive ETL tools in the BI space.

Talend has its presence in four areas especially, the Data Quality, Data Integration, Master Data Management and Application Integration.

Talend projects could be developed in Java or PERL. Talend has around 450 components currently and Talend's community has contributed many interesting components like tGoogleAnalytics.

TOS is the Talend's Open Studio for developing Talend jobs individually and TIS is the Integration Studio for collaboration.

Talend's version control is called SVN. TOS comes with a demo JAVA project that has sample jobs for quickly learning Talend in action.

I will try to post a sample job everyday in this blog of my-experiments-with-Talend.

Happy Talending!!