SSIS – Part 5
SSIS Interview Questions and Answers for Experienced and Fresher’s
SSIS Interview Questions and Answers Part 5
Here we are publishing series of posts on SSIS Interview questions and answers Part 5 for experienced and fresher’s. Below is the series 5.
Q. What are the SSIS package protection levels?
There are 6 different types of protection levels.
- Do not save sensitive – (When exporting using DTUTIL specify for protection- 0)
- Encrypt sensitive with user key – 1
- Encrypt sensitive with password – 2
- Encrypt all with password -3
- Encrypt all with user key – 4
- Rely on server storage
Do not save sensitive: makes the sensitive data unavailable to other users. If a different user opens the package, the sensitive information is replaced with blanks and the user must provide the sensitive information.
Encrypt sensitive with user key: Uses a key that is based on the current user profile to encrypt only the values of sensitive properties in the package. Only the same user who uses the same profile can load the package. If a different user opens the package, the sensitive information is replaced with blanks and the current user must provide new values for the sensitive data. If the user attempts to execute the package, package execution fails.
Encrypt sensitive with password: Uses a password to encrypt only the values of sensitive properties in the package. To open the package in SSIS Designer, the user must provide the package password. If the password is not provided, the package opens without the sensitive data and the current user must provide new values for sensitive data. If the user tries to execute the package without providing the password, package execution fails.
Encrypt all with password: Uses a password to encrypt the whole package. The user must provide the package password. Without the password the user cannot access or run the package.
Encrypt all with user key: Uses a key that is based on the current user profile to encrypt the whole package. Only the user who created or exported the package can open the package in SSIS Designer or run the package by using the dtexec command prompt utility
Rely on server storage: Protects the whole package using SQL Server database roles. This option is supported only when a package is saved to the SQL Server msdb database.
When it is time to deploy the packages, you have to change the protection level to one that does not depend on the developer’s user key. Therefore you typically have to select EncryptSensitiveWithPassword, or EncryptAllWithPassword. Encrypt the packages by assigning a temporary strong password that is also known to the operations team in the production environment.
Q. What are the phases of execution of a package when running with DTEXEC?
- Command sourcing phase: The command prompt reads the list of options and arguments
- Package load phase: The package specified by the /SQL, /FILE, or /DTS option is loaded.
- Configuration phase: Options are processed in this order:
- Options that set package flags, variables, and properties.
- Options that verify the package version and build.
- Options that configure the run-time behavior of the utility, such as reporting.
- Validation and execution phase: The package is run, or validated without running if the /VALIDATE option is specified.
Q. What are the exit codes from DTEXEC?
- 0: The package executed successfully.
- 1: The package failed.
- 3: The package was canceled by the user.
- 4: The utility was unable to locate the requested package.
- 5: The utility was unable to load the requested package.
- 6: The utility encountered an internal error of syntactic or semantic errors in the command line.
Q. Can you demonstrate the DTEXEC?
Execute a package located on file system:
DECLARE @returncode int
EXEC @returncode = xp_cmdshell ‘dtexec /f “C:\UpsertData.dtsx”‘
To execute an SSIS package saved to SQL Server using Windows Authentication:
dtexec /sq pkgOne /ser productionServer
To execute an SSIS package saved to the File System folder in the SSIS Package Store:
dtexec /dts “\File System\MyPackage”
To validate a package that uses Windows Authentication and is saved in SQL Server without executing the package:
dtexec /sq pkgOne /ser productionServer /va
To execute an SSIS package that is saved in the file system, and specify logging options:
dtexec /f “c:\pkgOne.dtsx” /l “DTS.LogProviderTextFile;c:\log.txt”
To execute an SSIS package that is saved in the file system and configured externally:
dtexec /f “c:\pkgOne.dtsx” /conf “c:\pkgOneConfig.cfg
Q. Process to upgrade DTS TO SSIS?
1. Choosing a DTS to SSIS Migration Strategy (Reactive/Proactive)
2. Capturing SSUA DTS Package Alerts (all categories of notifications)
3. Building a dev/test environment
4. Migrating the packages using the selected DTS to SSIS Migration Strategy
5. Testing/Correcting the resulting SSIS 2008 Packages in the dev/test environment
6. Deploying and reconfirming the resulting SSIS 2008 Packages work in production as expected
7. Removing the old DTS Packages from production w/optional SQL Server Agent Jobs
Q. Does all components are converted automatically from DTS TO SSIS?
Not all components can be upgraded. ActiveX transforms, for instance, present a challenge for the upgrade wizard, and may not be able to be migrated.
- Delete and recreate ODBC connections after package migration
- Reconfigure transaction settings after package migration
- Replace functionality of ActiveX script attached to package steps after package migration. Use Script task
- After migration, convert the Execute DTS 2000 Task that encapsulates the Analysis Services task to an Integration Services Analysis Services Processing task.
- After migration, re-create the functionality of the Dynamic Properties task by using Integration Services features such as variables, property expressions, and package configurations.
Q. Why is the need for data conversion transformations?
This transformation converts the datatype of input columns to different datatype and then route the data to output columns. This transformation can be used to:
- Change the datatype
- If datatype is string then for setting the column length
- If datatype is numeric then for setting decimal precision.
This data conversion transformation is very useful where you want to merge the data from different source into one. This transformation can remove the abnormality of the data. Example à The Company’s offices are located at different part of world. Each office has separate attendance tracking system in place. Some offices stores data in Access database, some in Oracle and some in SQL Server. Now you want to take data from all the offices and merged into one system. Since the datatypes in all these databases vary, it would be difficult to perform merge directly. Using this transformation, we can normalize them into single datatype and perform merg
Q. Explain why variables called the most powerful component of SSIS.
Variable allows us to dynamically control the package at runtime. Example: You have some custom code or script that determines the query parameter’s value. Now, we cannot have fixed value for query parameter. In such scenarios, we can use variables and refer the variable to query parameter. We can use variables for like:
- Updating the properties at runtime,
- Populating the query parameter value at runtime,
- Used in script task,
- Error handling logic
- With various looping logic.
Q. What are the for each loop enumerators available in SSIS?
- Below are the lists of various types of enumerators provided by SSIS Foreach Loop Container:
- Foreach File Enumerator: It enumerates files in a folder. The plus point here is it can traverse through subfolders also.
- Foreach Item Enumerator: It enumerates items in a collection. Like enumerating rows and columns in an Excel sheet.
- Foreach ADO Enumerator: Useful for enumerating rows in tables.
- Foreach ADO.NET Schema Rowset Enumerator: To enumerate through schema information about a data source. For example, to get list of tables in a database.
- Foreach From Variable Enumerator: Used to enumerate through the object contained in a variable. (if the object is enumerable)
- Foreach NodeList Enumerator: Used to enumerate the result set of an XML Path Language (XPath) expression.
- Foreach SMO Enumerator: It enumerates through SQL Server Management Objects (SMO) objects.
Q. We have a situation that needs to be push data into DB2 database from SQL Server. What connection manager you use to connect to DB2 running on AS/400?
Primary method to connect to DB2 is “Microsoft OLE DB Provider for DB2”. There is one more method using ADO.NET data providers \ ODBC Data provider.
OLEDB is always faster than ODBC, but there might be issues with OLEDB to DB2 while dealing with parameters in queries.
Q. What is “ActiveX Script” task? Does it available in SQL Server 2012?
- The ActiveX Script task provides a way to continue to use custom code that was developed using ActiveX script. ActiveX script task supports writing scripts using VBScript and Jscript and other languages installed in the local computer.
- This task is to just support’s the backward compatibility with the deprecated component DTS packages
- Now in SQL Server 2012 the “ActiveX Script” has to be upgraded to “Script Task”.
- “Script task” supports VB.Net and C#.Net
Q. What is the use of either “Script task” or “ActiveX Script”?
- Implementing customized business logics in SSIS packages. Example using the script task we can access table values, applies logic and those values can be added to SSIS variables.
- Performing complex computations for example modifying date formats using date functions
- To access data from sources for which no support from built-in connections, for example a script can use Active Directory Service Interface (ADSI) to access usernames from AD.
- To create a package specific performance counters for example a script can create a performance counter that can be updated when a complex task or poorly performed task executes.
Q. What is “Script Component”?
Script component is like a “Script Task” but it is designed for “Data Flow”. It can be useful in below scenarios.
- Apply multiple transformations to data instead of using multiple transformations in the data flow. For example, a script can add the values in two columns and then calculate the average of the sum.
- Use custom formulas and functions for example, validate passport numbers.
- Validate incoming column data and skip unmatched records
- Script Component support for inputs and outputs
- If used as a source: Supports multiple outputs
- If used as a transformation: Supports one input and multiple outputs
- If used as a destination: Supports one input
Q. Can we call a web service from SSIS? If Yes how?
Yes! We can call a web service using “Web Service” task. We have to provide HTTP connection manager and WebServiceDescriptionLanguage (WSDL) file. The output can be stored either in a variable or on file system (In XML, TXT etc)
Q. What is the use of derived column in SSIS?
Derived column transformation can process existing column data and apply some functionality.
For example to change the case of a string for a column: can replace the actual column by applying the expression UPPER(COLUMN) or LOWER(COLUMN).
Can also useful when need to calculate sum values example: Add a new column “Gross Value” by applying the expression (Column1+Column2)
Applying arithmetic operations like “Round” and calulating date and time differences etc.
In addition with this, it can deal with “NULL” values. When NULL values needs to be populated with blanks
If incase we can’t perform any of these kind of operations with dirived column we have another option called “Script Transform”
Q. Which config file we should use for storing SSIS package configurations?
There are different ways to do this. But it’s all depends on requirement and environment. I couldn’t see any problem which is resolved in one config type and can’t be resolved in other config option.
- If you are using a file system deployment, it probably makes more sense to use XML configuration files.
- If you are using a SQL Server deployment, it probably makes more sense to use SQL Server configurations.
- If your ETL solution is managed by the application owner or server administrator, it probably makes more sense to use XML configuration files.
- If your ETL solution is managed by the database administrator, it probably makes more sense to use SQL Server configurations.
- If project team members and/or administrators have past experience and success with a given configuration type, it probably makes sense to use that type unless there is some compelling project-specific reason to do otherwise.
Q. What are the possible issues in handling SSIS packages?
Mostly Data Conversion errors due to datatype mismatches – Truncation of strings, loosing some decimal points etc
Expression Evolution errors on run time: Unable to evaluate expressions at run time due to wrong comparisions etc
Package Validation Errors: When we configure a variable to locate a file on file system which is actually creates on run time the package files debugging as initially the file is not located at the specified path. To avoid these issues set the property “DelayValidation” to “True”
Package Configuration Issues: Always make sure that we are using the right package at the right environment. It always depends on package configuration. Package will be used on dev,test and prod environments with different config values. If wrong config values are passed to a SSIS package which may leads to loss of data or data corruption.
To avoid these issues two things we have to consider
1. Use a centralized database to store all SSIS package config values
2. Use different account (either domain or sql) for different environments
3. Tight the security by assigning only required permissions to the SSIS user accounts.
So that even though a dev package runs with the prod credentials it fails to connect to the instance.
When a Variable is using another Variable:
See we usually give a variable as a source for “Execute SQL Task”. But for the variable value is setting by evaluating an expression which is using another variable.
For example we have created a variable called “FileName” and it’s been using in Execute SQL Task. But a filename should be evaluated as “B2B_Reports_”+User:BatchID. Here BatchID is another variable.
By default it fails to evaluate this expression and to fix this we have to change the variable property “EvaluateAsExpression” to “True”
Running SSIS packages on 64 bit and dealing with Excel files:
Typically excel files are not provided with 64 bit drivers (Excel 2010 has 64 bit but not before).
So to deal with excel files from SSIS which is running on 64 bit is bit difficult task.
There is an option in SSIS which allows SSIS package to support 32 bit execution on 64 bit environment.
From project properties on debugging page an option called “Run64BitRunTime”. By default it’s set to be true for SSIS running on 64 bit. We have to modify this to false to handle with 32-bit support activities. Below are more reasons to use this option
From SSIS where it’s running on 64 bit:
We can’t call a ExecuteDTS package task as it doesn’t support 64 bit
It may raise errors while using Script Task or Script Component. Might be using Dotnet assembles or COM objects for which there might be no 64 bit support available or drivers are not installed.
Case Sensitive issues:
One of the popular data transformations is a Lookup. It compares column values from two tables. Unlike T-SQL SSIS is a case sensitive comparison so we have to be careful in handling with these transformations and tasks.
Q. What are event handlers in SSIS?
Event handlers allow MSBI developers to monitor and audit SSIS packages. Event handlers can be associated with SSIS components and executables. Any component that can be added to the control flow is called as “Executable” plus the package itself. Child components is considered to be child executable and parent is known as parent executable.
Q. What are the different types of event handlers?
Q. What are the general cases that event handlers can be helpful in?
Cleanup stage tables after a bulk load completed
Send an email when a specific component failed
Load lookup tables after a task completed
Retrieve system / resource information before starting a task.
Q. How to implement event handlers in SSIS?
Create log tables (As per the requirement) on centralized logging database
On BIDS / SSDT add event handler
Add control flow elements. Most of the times “Execute SQL Task”
Store the required information (RowCounts – Messages – Time durations – System / resource information).
We can use expressions and variables to capture this information.
Q. What is container hierarchy in attaching event handlers?
Container hierarchy plays a vital role in implementing “Event Handlers” in SSIS. If an event handler is attached to a “Package” (a package itself it’s a container), then, the event handler applies to all associated components of that package. We need not attach the event handlers to all of them separately. But if we want to switch off the event handlers to any of the specific component in a container simply change the property “Disable EventHandlers” to “TRUE”.
Q. How to implement SCD (Slowly changing dimension) type 2 using SSIS?
Type 2 means we have to keep historical data.
Assume that we have a table called “Employee_Stage” at Stage Server and “Employee_Archieved” at Archive Server.
Now we have to read data from stage and insert into archive instance.
We have to implement SCD type 2, means we have to keep the changed records information. For example for an employee a column “Designation” has been changed then a new row has to be inserted into archive.
While inserting there are three columns that helps us in identifying the old and current records for a given employee.
StartDate – startdate for the given designation
EndDate – enddate for the given designation
IsCurrent – Bit column : 1 – Current ; 0 – History
Let’s start designing SSIS package:
As usual create a SSIS project
- Create two connection managers. 1. Staging, 2 – Archive
- Drag and drop a dataflow task at control flow
- Open data flow task and add a OLEDB source and map it with stage connection manager
- Drag and drop “SCD transformation” to data flow task.
- Double click and configure SCD as below.
- Map “Archive” connection manager and choose the “Business Key” in the archive table.
Business key is nothing but a column which can be used to compare / lookup with stage table. Here I have given “EmpID” as a BusinessKey.
We have to mention “Change Type” for SCD columns.
There are three change types available as below
‘Fixed attribute’, ‘Changing attribute’ and ‘Historical Attribute’.
I do choose “Historical Attribute” for the column Designation. Based on this a new record will be inserted into archive if the column value is changed in stage table.
- Now give the historical attribute options. There are two options available to identify current and historical records. Based on a single column, “Based on two date values”
- Here I choose the first option and give “1” to current and “0” to expiration
Don’t select “Inferred member support” as this is not useful in this scenario.
- Click finish, it’ll automatically creates some transformations and destination that includes “derived Column” to add flag values to “IsCurrent” column, OLEDB Command to update the “IsCurrent” column and OLEDB destination to insert new records into archive table.
Note 1: To implement SCD type – 1 (Means overwrite the values) have to follow the same steps above. Instead of choosing “Hierarchical Attribute” choose “Changing Attribute”.
Note 2: “Fixed Attribute” can be useful at situations where to apply a domain rule for example the column “NationalNumber” has to be fixed. If the column is forced to overwritten then there would be an error or it would be redirected but it never allow to be changed.
Q. Can we use a temp table is data flow that is created in control flow?
Yes we can use.
Assume we are executing a stored procedure from “Execute SQL Task”. That stored procedure creates a global temp table on database and the same temp table has to be used in dataflow while creating OLEDB source, we can give a query like “SELECT * FROM ##TempTable”.
To use a temp table in SSIS from the same connection some of the properties has to be set as below.
From the properties of OLEDB connection manager change the value to “TRUE” for the property “RetainSameConnection”.
For OLEDB source in dataflow make sure the property “ValidateExternalMetadata” to “False” as it fails to locate the temp table at complaining phase.
1. Store all package configurations in SQL Server
See for example we are using the same package for all development, test and stage server every time we need to execute that package we need not have different packages instead we just need different configuration files by pointing the proper config file using XML and then choosing proper XML by using environment variable.
But I do suggest do validation at database side. For example instead of applying validation rules at package level, use a stored procedure at source to apply / check all validations and then from that select the data which can be directly loaded to destination.
By doing this there might be overhead at database but operation would be faster as validations can be applied or all rows in bulk set operation where as in SSIS the same validation has to be applied as row by row. And if any modifications required at validations, we can simply modify the stored procedure and need not touch the SSIS package.
ISNULL(LastName)? “Unknown last name”:LastName
ISNULL(DaysToManufacture + @AddDays)
See when we are using lookup on columns from source and destinations as we know that SSIS.Lookup is a case sensitive not like T-SQL. So beofore comparing two columns we can design data flow to pas those two columns through “CharecterMAP” and can convert data into a common format either “Lower” or “Upper” case.
Import Column Transformation – The Import Column transformation reads data from files and adds the data to columns in a data flow. Using this transformation, a package can add text and images stored in separate files to a data flow. *
Q. How matching happens inside the lookup transformation?
Lookup transformation tries to perform an equi-join between transformation input and reference dataset. By default unmatched row is considered as an error however we can configure lookup to redirect such rows as “no match output” (from 2008 and above).
If the reference data set is having multiple matches it returns only the first match. In-case the reference set is a cache then it raises a warning or error incase of multiple matches.
Q. What are all the inputs and outputs of a lookup transformation?
Input: Dataset from data source
Match Output: All matched rows
No Match Output: All not matched rows. If unmatched rows are not configured to redirect to error output then such rows are redirected to no match output
Q. Have you ever used Dataflow Discoverer (DFLD) is SSIS? If yes can you describe why and how?
Please have a look at below links to get the detailed explanation about DFLD
Q. How to transfer logins using SSIS?
It can be done using Transfer SQL Server Login. But there are limitations.
Transferring windows authentication logins to cross domain: Drop and recreate logins
Transferring SQL Logins: Need to change the password as a random password is chosen while moving from source to destination.
Best way to move logins is using scripts: Logins, users, role mapping scripts.
Q. How to troubleshoot connection error regarding
1. Incorrect Provider for the connection:
A) Lack of 64-bit provider: Remember on target server if the package is running on 32-bit, make sure that the execution option “Use 32-Bit runtime” is being selected while creating the job to execute the SSIS package.
B) Lack of client binary installed: Make sure client binaries installed on target server.
2. Incorrect connection parameters settings:
A) Typo in password
B) Password is not stored in configuration
3. Failed to decrypt the sensitive information: It usually happens when a SSIS package is executing from SQL Server agent job. If the package is saved with the option “SaveSensitiveWithUserKey” and the sql agent service account different from package creator.
4. Oracle Data Provider Limitation:
Another common scenario happens when you use Microsoft OLE DB Provider for Oracle or Microsoft ODBC Driver for Oracle to connect to Oracle9i or later version database. Recommended is Oracle OLE DB Provider for the Oracle 9i or later versions
Q. What are the logs available to check if a SSIS package fails?
1. Windows Event Log & Job History: When SSIS package scheduled from a SQL Job
2. Logs from SSIS Logging Audit: When a log provider configured for package
3. Logs from SSIS Event Handler: When event handler designed to capture the log
4. Logs from the SSIS components: When custom logging configured using Script task
5. Logs from underlying data sources: Check the error log at data source example SQL Server, Oracle etc.
For more MSBI stuff please have a look at below references:
SSIS Interview Questions and Answers Part 5