ETL ( Automation vs Manual )



ETL applications like Talend, SSIS and Pentaho can simplify ETL or Data migration by eliminating or reducing the amount of SQL coding needed to complete the process. However same functions performed by ETL applications can be performed by scripting using various dialect of SQL and programming language like C# or Java. The downside of scripting manually without an ETL application is the need of connecting to data source and destination source. However most vendors like Microsoft SQL Server provides developers the option through linked server using SQL Server Management Studio or Oracle MySQL Workbench for MySQL.  Depending on the application a third programming language is need as interface like Java or C++ when performing the process from a developmental application. This will allow both the source and destination sources to be linked.  Another problem with the manual process is the complexity of SQL coding needed which needs to be altered constantly and the length of the codes involved. Stored procedures can be utilized to break down the SQL length and to hide the query definition using the database security protocols and to maintain database ACID properties. With ETL application all you have to define the source and destination and drag the necessary components to perform the cleaning process. You can view the codes behind each tasks like in Talend Open studio

Talend & Pentaho

  1. Runs on java and needs the right version of Java installed to function properly for the selected JDBC drivers.                                                                                         
  2. To establish connection at both ends you need a connection string  that requires both the host name or the server name,address  and the port name. This where authentication and server security plays a major role. 
  3. When performing advanced task Java is used to write the necessary scripts and previous knowledge of java is needed like displaying row count. 
  4. Talend and Pentaho must be installed using the correct version of Java Jre unless the application will not load or open. It is best  to have different java folders when operating with multiple applications that runs on Java and edit the configuration files. Having OBIEE, Pentaho  and Talend  required different versions of Java on my server few years ago.
SSIS
  1. Utilized .Net frame , ODBC and OLDB to connect to data source.  .Net  connectors has to be installed manually after the Visual Studio has been installed depending on the  vendor and the version of BIDS or Visual Studio installed. 
  2. For advance components in SSIS like Scripting components to display row count  when the package is being executed or send log messages to the database  C# is utilized to perform the task.  I had to learn the basic in C#  including library and class to perform the task.
Throughout my career i had the opportunity to work in the three popular ETL application.  SSIS is easiest to learn and used. Talend and Pentaho have a sharp learning curve. When you have learned Talend Open Studio you can work with another Java based application. The presentation layer are identical meaning the both perform same task like Join but the arrangement is different.

Comments

Popular posts from this blog

Data Migration

SSIS, SSAS & Stored Procedure