Showing posts with label Performance Tuning. Show all posts
Showing posts with label Performance Tuning. Show all posts

Thursday, October 31, 2013

Tips to interact with Databases efficiently using Talend

TIPS to interact efficiently with Databases using Talend

  1. In the Database Output component choose the "Enable Parallel Execution" (Last option in the Advance Settings)
  2. Use a reasonable batch-size (The default is 10,000 rows)
  3. Use the ELT components and Bulk Execute components where ever applicable.
  4. Choose only the columns you are interested in your data-sets. Avoid using select * from in your queries.
  5. Choose only those rows that you are interested in by limiting them with a WHERE clause
  6. Do not perform blocking operations like "Aggregate/Sorting" on the Talend. The DB is more efficient in doing these operations. For Aggregation/Sorting, the entire dataset has to be loaded before the operation is performed. So if the data-set is huge, it could choke the memory and cpu.
  7. Understand the "Commit" options. Commiting frequently has pros and cons


    •  Example - You have 20000 rows to be written to the database. Lets assume that we are commiting for every 5000 rows. If there is any error after the second commit (meaning after 5000 + 5000 = 10000 rows), our database will be in a state where rows have been written partially. Unless your code is re-runnable, you could end up in a situation where you need to first clean up before you start again..
    • Commiting frequently avoids huge log files on the database.
    • Commiting only at the end could have an impact on the memory usage if it is a large batch of rows that you are waiting to commit.

Parallelization in Talend

You can achieve Parallelization in Talend in 2 ways.

  1. Running SubJobs in Parallel by using the Multi-threaded Executions
    • Enabling Mulit-threaded Execution is hidden in the Jobs view of the studio.
    • Also note that enabling multi-thread on a single processor could hurt the performance
  2. Using the tParallelize component of Talend.
    • The tParallelize component is an Orchestration component.
Screenshots of a simple sample Job running as a single thread, multi-thread and with tParallize are shown below.

 Job running as a single thread

Job running as a single thread
Using Multi-threaded Execution





Multi-threaded Execution


Using the tParallelize component of Talend
Using TParallelize component of Talend








Friday, October 7, 2011

Talend ELT Components

Purpose of the ELT Talend Components :

The ELT Talend components (Input, Mapper and Output) are useful for bulk loading the tables by pushing the processing overhead to the Database server rather than iterating through the resultsets in memory.

Example ELT Components for MYSQL. 

The three components (input, map, output) are closely tied and have to be used together.
  1. tELTMySqlInput
  2. tELTMySqlMap
  3. tELTMySqlOutput
 What can you do with ELT Components :
  1. The main purpose of the ELT components is PERFORMANCE.
  2. Bulk read/insert into tables by using different tables and all types of JOINS, WHERE clauses and GROUP BY and HAVING clauses.
What can you NOT do with ELT Components :
  1. You cannot JOIN tables from different SCHEMAS which means that the tables should be sourced from the same database schema.
Things to keep in mind when using the ELT_Mapping component

It always helps to check the SQL generated for us in the "Generated SQL" tab of editor of the ELT_Mapping component for syntax errors.

Happy Talending!

-Praveena