ETL ( Automation vs Manual )
ETL applications like Talend, SSIS and Pentaho can simplify ETL or Data migration by eliminating or reducing the amount of SQL coding needed to complete the process. However same functions performed by ETL applications can be performed by scripting using various dialect of SQL and programming language like C# or Java. The downside of scripting manually without an ETL application is the need of connecting to data source and destination source. However most vendors like Microsoft SQL Server provides developers the option through linked server using SQL Server Management Studio or Oracle MySQL Workbench for MySQL. Depending on the application a third programming language is need as interface like Java or C++ when performing the process from a developmental application. This will allow both the source and destination sources to be linked. Another problem with the manual process is the complexity of SQL coding needed which needs to be altered constantly and the length of the codes involved. Stored procedures can be utilized to break down the SQL length and to hide the query definition using the database security protocols and to maintain database ACID properties. With ETL application all you have to define the source and destination and drag the necessary components to perform the cleaning process. You can view the codes behind each tasks like in Talend Open studio
Talend & Pentaho
- Runs on java and needs the right version of Java installed to function properly for the selected JDBC drivers.
- To establish connection at both ends you need a connection string that requires both the host name or the server name,address and the port name. This where authentication and server security plays a major role.
- When performing advanced task Java is used to write the necessary scripts and previous knowledge of java is needed like displaying row count.
- Talend and Pentaho must be installed using the correct version of Java Jre unless the application will not load or open. It is best to have different java folders when operating with multiple applications that runs on Java and edit the configuration files. Having OBIEE, Pentaho and Talend required different versions of Java on my server few years ago.
- Utilized .Net frame , ODBC and OLDB to connect to data source. .Net connectors has to be installed manually after the Visual Studio has been installed depending on the vendor and the version of BIDS or Visual Studio installed.
- For advance components in SSIS like Scripting components to display row count when the package is being executed or send log messages to the database C# is utilized to perform the task. I had to learn the basic in C# including library and class to perform the task.
Comments
Post a Comment