Microsoft SSIS Vs Talend
Microsoft SQL Server Integration Services (SSIS) and Talend are the two popular ETL applications I have worked with it beside Informatica and Pentaho. Those two applications will be later covered in later series. MS SSIS and Talend features are quite different but they provide same services in terms of ETL task. Listed below are some of the features but not the complete list.
Microsoft SSIS
- SSIS has two main components which are Data Flow and Control Flow.
- Most of the data cleaning are done in the Data Flow component including sorting, derived column, aggregation, row count, data conversion etc.
- The connection manager can be used to create connection properties for both the source and final destination strings using OLEDB , .NET Frame, Flat File etc.
- Control Flow defines the ETL environment which consists of Data Flow Task, Execute SQL Task, Bulk Insert Task, XML, Web Services, FTP, File System Task etc. Each task can be grouped into sections made up of multiple tasks along with Data Flow Task. Special conditions can be introduced to catch errors or redirect data flow.
- SSIS like Talend and Other ETL applications allows for parameters to restrict the data flow based on the underlying stored procedures or SQL Statements.
- Errors logs can be stored in database systems and the package configuration settings including name, user and timestamp information.
- SSIS like other ETL applications can be executed remotely or within different applications or packages in terms of SSIS.
- SSIS utilizes System Variable to write name or other information for flat file destinations.
- Script components enables for advance scripting within SSIS using C, C# ,C ++ or Visual Basic comments. For example Script component can be utilizes to display the number of rows generated for each execution within a defined variable created by the user.
- SSIS has many other tasks that can be utilized to transfer database, rebuild index or process OLAP Cube data for data mining.
- Business Intelligence Development Studio(BIDS) or SQL Server Data Tools (SSDT) are the only environment utilized to create SSIS packages.
- Packages can be deployed within BIDS ,SSDT, SQL Sever Database System or Execute Package Utility.
Talend Data Studio
- Talend runs on Java plarform and utilizes Java for advance scripting components.
- Talend utilizes JDBC to connect to other database systems. It has OLEDB and .Net components but JDBC is the best option to prevent any system errors like i have encountered. Even with JDBC you have to ensure it version aligns with the java in the system.
- Talend stores connection strings in the Metadata and it is where the connection strings are defined and established including XML, Excel, Flat File, Web Services, FTP etc.
- For parameters to be defined in Talend, Context has to be defined and attached to the underlying stored procedures.
- Talend utilizes Palette to specify the type of task to be performed similar to SSIS with Orchestration, Processing and Database are the three main important ones depending on the type of ETL work.
- Each database systems has main palette that defines specific type action that would be utilized within the connection string like tMSSqlclose, tMSSinput, tMSSqlOutput etc for SQL Server.
- Each object will display the type of information related to specific task like input based on the database system after the connection has been established and the schema objects have been processed in the metadata.
- Processing Palette has various tasks like tfilterrow, tFilterColumns, tAggregateRow, tJoin and tMap which are used to filter row or column, aggregate data, join multiple data source using primary key or map columns using expression to transform the data to a destination column.
- Just like SSIS , Talend also has looping task like tForeach and tloop used to process a task in sequential or cycle format.
- Talend calls each ETL works Jobs while SSIS calls it Package.
- Although I have not used Talend Scheduling and Logs & Error services, Talend does offer similar function like SSIS.
Comments
Post a Comment