This post demonstrates the incremental load in SSIS with example. Based on the business requirement and data volume we should determine the ETL process. We’ll assume we have a low to mid-sized source data file and here are the requirements:
We have a distributed database system and getting customer information on daily basis to load into SQL Server instance by following the below conditions:
- Customer information is sending in flat file format
- We are getting Delta Data Feed that means it includes new, updated and deleted customer’s data.
- Compare Source (Flat File) and Destination (SQL Server) customer ID
- If no match found at destination: Insert Row into Destination
- If match found and IsDeleted = 1 / True: Delete the row from Destination
- If match found: Update row from source to destination
- We are using Sales.Customer from [AdventureWorks].
- Be cautious in using OLEDB Command for large datasets as it’s a row by row process and impact the performance.
- In case of doing incremental load for large datasets, load data into staging table and then update / delete using a query through Execute SQL Task.
ETL Solution Design:
Flat File Source: To extract data from source flat file
Data Conversation: Convert source columns datatype to match the destination customer table
Lookup: Compare source and destination customerID column
- No-Match Output: When there is no match found in destination
- Match Output: When match found between source and destination
Conditional Split: Check if IsDeleted = 1 then redirect it to DELETE output. Redirect default output to UPDATE output.
OLEDB Destination: For inserting new records
OLEDB Command (Update): For updating rows
OLEDB Command (Delete): For deleting rows
ETL Package Design:
Open SQL Server Data Tools
Create a new Project name it as “Sales_ETL” and Package name it as “Customer_ETL”
Create two connection managers 1) Source – New Flat File Connection 2) New OLE DB Connection
Add a Data Flow Task to Control Flow and name it as “Customer_Data_ETL”
In Data Flow Task, add Flat File Source and name it as “Extract_Customer_From_FlatFile” and connect to the flat file source:
Add “Data Conversion” and convert datatype for all columns as per the destination table:
Add lookup transformation and configure as below
Full Cache: Sine we have a small reference data set we can choose Full Cache option. It means it capture the entire reference set into Memory instead of connecting database every time when required.
Redirect: Rows to No Match when no match found at destination.
Connections: Instead of getting entire table we are using a SQL Query as we required only CustomerID.
Here we are mapping source column “Copy of CustomerID” to destination column “CustomerID”.
To differentiate between source and reference make Output Alias as “Dst_CustomerID”.
Add a OLEDB Destination and name it as “Insert Customer”. Map No Match output from Lookup to OLEDB Destination:
Edit properties for OLEDB Destination “Insert Customer”, connect destination and configure properties and map columns:
Map columns from converted input columns to destination columns as below:
ETL package is ready to handle new customer INSERT. Now we need to handle “Updated” & “Deleted” customers. Add Conditional Split transformation and connect “Lookup Match Output”:
Now we need to configure Conditional Split transformation to identify the Deleted & Updated rows:
Output Name: Name it as “Delete Customer” and apply condition isdeleted == “True”
Default Output Name: “Updated Customer” as remaining all rows comes in default output where isdeleted <> “true”
Now we need UPDATE and DELETE customer data in destination for that we are going to use “OLE DB Command”:
Configure OLEDB Command – Update Customer
Configure OLEDB Command – Delete Customer
Finally the package data flow looks like below:
ETL Package Execution:
Case 1: Customer Table is empty and Flat File is having 1702 rows
Case 2: Passing Delta data; 15 New Rows, 10 Updated Rows, 5 Rows with IsDeleted = 1;
- This post explains a way to implement incremental data load using SSIS
- We used a small dataset to demonstrate the incremental load
- Most important point to be noted: “OLEDB Command” is a row based operation and execute the SQL Statement for each row coming input row.