SSIS

36
Q1 Explain architecture of SSIS? SSIS architecture consists of four key parts: a) Integration Services service: monitors running Integration Services packages and manages the storage of packages. b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications. c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks. d) Data flow engine: provides the in-memory buffers that move data from source to destination. Q2 How would you do Logging in SSIS? Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.

Transcript of SSIS

Page 1: SSIS

Q1 Explain architecture of SSIS?SSIS architecture consists of four key parts:a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.

b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.d) Data flow engine: provides the in-memory buffers that move data from source to destination.

Q2 How would you do Logging in SSIS?Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.

Q3 How would you do Error Handling?A SSIS package could mainly have two types of errorsa) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.

Page 2: SSIS

Q4 How to pass property value at Run time? How do you implement Package Configuration?A property value like connection string for a Connection Manager can be passed to the pkg using package configurations.Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.

Q5 How would you deploy a SSIS Package on production? A) Through Manifest1. Create deployment utility by setting its propery as true .2. It will be created in the bin folder of the solution as soon as package is build.3. Copy all the files in the utility and use manifest file to deply it on the Prod.B) Using DtsExec.exe utilityC)Import Package directly in MSDB from SSMS by logging in Integration Services.

Q6 Difference between DTS and SSIS?Every thing except both are product of Microsoft :-).

Q7 What are new features in SSIS 2008?explained in other posthttp://sqlserversolutions.blogspot.com/2009/01/new-improvementfeatures-in-ssis-2008.html

Q8 How would you pass a variable value to Child Package?too big to fit here so had a write other posthttp://sqlserversolutions.blogspot.com/2009/02/passing-variable-to-child-package-from.html

Q9 What is Execution Tree?Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation and each new tree may also give you an additional worker thread.

Q10 What are the points to keep in mind for performance improvement of the package?http://technet.microsoft.com/en-us/library/cc966529.aspx

Q11 You may get a question stating a scenario and then asking you how would you create a package for that e.g. How would you configure a data flow task so that it can transfer data to different table based on the city name in a source table column?

Q13 Difference between Unionall and Merge Join?a) Merge transformation can accept only two inputs whereas Union all can take

Page 3: SSIS

more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.

Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.For you.

Q15 How would you restart package from previous failure point?What are Checkpoints and how can we implement in SSIS?When a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time that the package is run.

Q16 Where are SSIS package stored in the SQL Server?MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.

Q17 How would you schedule a SSIS packages?Using SQL Server Agent. Read about Scheduling a job on Sql server Agent

Q18 Difference between asynchronous and synchronous transformations? Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.

Q19 How to achieve parallelism in SSIS?Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and is calculated as number of processors + 2.

-More questions added-Sept 2011 Q20 How do you do incremental load?

Fastest way to do incremental load is by using Timestamp column in source table and then storing last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp so as to pick only new and updated records

Q21 How to handle Late Arriving Dimension or Early Arriving Facts. 

Late arriving dimensions sometime get unavoidable 'coz delay or error in Dimension ETL or may be due to logic of ETL. To handle Late Arriving facts, we can create dummy Dimension with natural/business key and keep rest of the attributes as null or default.  And as soon as Actual dimension arrives, the dummy dimension is updated with Type 1 change. These are also known as Inferred Dimensions.

Page 4: SSIS

What is SQL Server Integration Services (SSIS)?

SQL Server Integration Services (SSIS) is component of SQL Server 2005 and later versions. SSIS is an enterprise scale ETL (Extraction, Transformation and Load) tool which allows you to develop data integration and workflow solutions. Apart from data integration, SSIS can be used to define workflows to automate updating multi-dimensional cubes and automating maintenance tasks for SQL Server databases.

How does SSIS differ from DTS?

SSIS is a successor to DTS (Data Transformation Services) and has been completely re-written from scratch to overcome the limitations of DTS which was available in SQL Server 2000 and earlier versions. A significant improvement is the segregation of the control/work flow from the data flow and the ability to use a buffer/memory oriented architecture for data flows and transformations which improve performance.

What is the Control Flow?

When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components.  The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order.

What is the Data Flow Engine?

The Data Flow Engine, also called the SSIS pipeline engine, is responsible for managing the flow of data from the source to the destination and performing transformations (lookups, data cleansing etc.).  Data flow uses memory oriented architecture, called buffers, during the data flow and transformations which allows it to execute extremely fast. This means the SSIS pipeline engine pulls data from the source, stores it in buffers (in-memory), does the requested transformations in the buffers and writes to the destination. The benefit is that it provides the fastest transformation as it happens in memory and we don't need to stage the data for transformations in most cases.

What is a Transformation?

 A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are  removed.  Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There

Page 5: SSIS

are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.

What is a Task?

A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language.

What is a Precedence Constraint and what types of Precedence Constraint are there?

SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.

These are the types of precedence constraints and the condition could be either a constraint, an expression or both 

o Success (next task will be executed only when the last task completed successfully) or

o Failure (next task will be executed only when the last task failed) or o Complete (next task will be executed no matter the last task was

completed or failed).

What is a container and how many types of containers are there?

A container is a logical grouping of tasks which allows you to manage the scope of the tasks together.

These are the types of containers in SSIS: o Sequence Container - Used for grouping logically related tasks

togethero For Loop Container - Used when you want to have repeating flow in

packageo For Each Loop Container - Used for enumerating each object in a

collection; for example a record set or a list of files. Apart from the above mentioned containers, there is one more container

called the Task Host Container which is not visible from the IDE, but every task is contained in it (the default container for all the tasks).

What are variables and what is variable scope?

A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create,

Page 6: SSIS

assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.

Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container.

What are SSIS Connection Managers?

When we talk of integrating data, we are actually pulling data from different sources and writing it to a destination. But how do you get connected to the source and destination systems? This is where the connection managers come into the picture. Connection manager represent a connection to a system which includes data provider information, the server name, database name, authentication mechanism, etc. For more information check out the SQL Server Integration Services (SSIS) Connection Managers and Connection Managers in SQL Server 2005 Integration Services SSIS tips.

What is the RetainSameConnection property and what is its impact?

Whenever a task uses a connection manager to connect to source or destination database, a connection is opened and closed with the execution of that task. Sometimes you might need to open a connection, execute multiple tasks and close it at the end of the execution. This is where RetainSameConnection property of the connection manager might help you. When you set this property to TRUE, the connection will be opened on first time it is used and remain open until execution of the package completes.

What are a source and destination adapters?

A source adaptor basically indicates a source in Data Flow to pull data from. The source adapter uses a connection manager to connect to a source and along with it you can also specify the query method and query to pull data from the source.

Similar to a source adaptor, the destination adapter indicates a destination in the Data Flow to write data to. Again like the source adapter, the destination adapter also uses a connection manager to connect to a target system and along with that you also specify the target table and writing mode, i.e. write one row at a time or do a bulk insert as well as several other properties.

Please note, the source and destination adapters can both use the same connection manager if you are reading and writing to the same database.

What is the Data Path and how is it different from a Precedence Constraint?

Data Path is used in a Data Flow task to connect to different components of a Data Flow and show transition of the data from one component to another. A data path contains the meta information of the data flowing through it, such as the columns, data type, size, etc. When we talk about differences between

Page 7: SSIS

the data path and precedence constraint; the data path is used in the data flow, which shows the flow of data. Whereas the precedence constraint is used in control flow, which shows control flow or transition from one task to another task.

What is a Data Viewer utility and what it is used for?

The data viewer utility is used in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. The data viewer utility is placed on a data path to see what data is flowing through that specific data path during execution. The data viewer utility displays rows from a single buffer at a time, so you can click on the next or previous icons to go forward and backward to display data. Check out the Data Viewer enhancements in SQL Server Denali.

What is an SSIS breakpoint? How do you configure it? How do you disable or delete it?

A breakpoint allows you to pause the execution of the package in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. You can right click on the task in control flow, click on Edit Breakpoint menu and from the Set Breakpoint window, you specify when you want execution to be halted/paused. For example OnPreExecute, OnPostExecute, OnError events, etc. To toggle a breakpoint, delete all breakpoints and disable all breakpoints go to the Debug menu and click on the respective menu item. You can event specify different conditions to hit the breakpoint as well. To learn more about breakpoints, refer to Breakpoints in SQL Server 2005 Integration Services SSIS.

What is SSIS event logging?

Like any other modern programming language, SSIS also raises different events during package execution life cycle. You can enable or write these events to trace the execution of your SSIS package and its tasks. You can also can write your custom message as a custom log. You can enable event logging at the package level as well as at the tasks level. You can also choose any specific event of a task or a package to be logged. This is essential when you are troubleshooting your package and trying to understand a performance problem or root cause of a failure. Check out this tip about Custom Logging in SQL Server Integration Services SSIS.

What are the different SSIS log providers?

There are several places where you can log execution data generated by an SSIS event log:

o SSIS log provider for Text files o SSIS log provider for Windows Event Log o SSIS log provider for XML files o SSIS log provider for SQL Profiler

Page 8: SSIS

o SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.

How do you enable SSIS event logging?

SSIS provides a granular level of control in deciding what to log and where to log. To enable event logging for an SSIS Package, right click in the control flow area of the package and click on Logging. In the Configure SSIS Logs window you will notice all the tasks of the package are listed on the left side of the tree view. You can specifically choose which tasks you want to enable logging. On the right side you will notice two tabs; on the Providers and Logs tab you specify where you want to write the logs, you can write it to one or more log providers together. On the Details tab you can specify what events do you want to log for the selected task.

Please note, enabling event logging is immensely helpful when you are troubleshooting a package, but also incurs additional overhead on SSIS in order to log the events and information. Hence you should only enabling event logging when needed and only choose events which you want to log. Avoid logging all the events unnecessarily.

What is the LoggingMode property?

SSIS packages and all of the associated tasks or components have a property called LoggingMode. This property accepts three possible values: Enabled - to enable logging of that component, Disabled - to disable logging of that component and UseParentSetting - to use parent's setting of that component to decide whether or not to log the data.

What is the transaction support feature in SSIS?

When you execute a package, every task of the package executes in its own transaction. What if you want to execute two or more tasks in a single transaction? This is where the transaction support feature helps. You can group all your logically related tasks in single group.  Next you can set the transaction property appropriately to enable a transaction so that all the tasks of the package run in a single transaction. This way you can ensure either all of the tasks complete successfully or if any of them fails, the transaction gets roll-backed too.

What properties do you need to configure in order to use the transaction feature in SSIS?

Suppose you want to execute 5 tasks in a single transaction, in this case you can place all 5 tasks in a Sequence Container and set the TransactionOption and IsolationLevel properties appropriately.

o The TransactionOption property expects one of these three values:

Page 9: SSIS

Supported - The container/task does not create a separate transaction, but if the parent object has already initiated a transaction then participate in it

Required - The container/task creates a new transaction irrespective of any transaction initiated by the parent object

NotSupported - The container/task neither creates a transaction nor participates in any transaction initiated by the parent object

Isolation level dictates how two more transaction maintains consistency and concurrency when they are running in parallel. To learn more about Transaction and Isolation Level, refer to this tip.

When I enabled transactions in an SSIS package, it failed with this exception: "The Transaction Manager is not available. The DTC transaction failed to start."  What caused this exception and how can it be fixed?

SSIS uses the MS DTC (Microsoft Distributed Transaction Coordinator) Windows Service for transaction support.  As such, you need to ensure this service is running on the machine where you are actually executing the SSIS packages or the package execution will fail with the exception message as indicated in this question.

What is event handling in SSIS?

Like many other programming languages, SSIS and its components raise different events during the execution of the code. You can write an even handler to capture the event and handle it in a few different ways. For example consider you have a data flow task and before execution of this data flow task you want to make some environmental changes such as creating a table to write data into, deleting/truncating a table you want to write, etc. Along the same lines, after execution of the data flow task you want to cleanup some staging tables. In this circumstance you can write an event handler for the OnPreExcute event of the data flow task which gets executed before the actual execution of the data flow. Similar to that you can also write an event handler for OnPostExecute event of the data flow task which gets executed after the execution of the actual data flow task.  Please note, not all the tasks raise the same events as others. There might be some specific events related to a specific task that you can use with one object and not with others.

How do you write an event handler?

First, open your SSIS package in Business Intelligence Development Studio (BIDS) and click on the Event Handlers tab. Next, select the executable/task from the left side combo-box and then select the event you want to write the handler in the right side combo box. Finally, click on the hyperlink to create the event handler. So far you have only created the event handler, you have not specified any sort of action. For that simply drag the required task from the toolbox on the event handler designer surface and configure it appropriately. To learn more about event handling, click here.

Page 10: SSIS

What is the DisableEventHandlers property used for?

Consider you have a task or package with several event handlers, but for some reason you do not want event handlers to be called. One simple solution is to delete all of the event handlers, but that would not be viable if you want to use them in the future. This is where you can use the DisableEventHandlers property.  You can set this property to TRUE and all event handlers will be disabled. Please note with this property you simply disable the event handlers and you are not actually removing them.  This means you can set this value to FALSE and the event handlers will once again be executed.

What is SSIS validation?

SSIS validates the package and all of it's tasks to ensure it has been configured correctly.  With a given set of configurations and values, all the tasks and package will execute successfully. In other words, during the validation process, SSIS checks if the source and destination locations are accessible and the meta data about the source and destination tables are stored with the package are correct, so that the task will not fail if executed. The validation process reports warnings and errors depending on the validation failure detected. For example, if the source/destination tables/columns get changed/dropped it will show as error. Whereas if you are accessing more columns than used to write to the destination object this will be flagged as a warning. To learn about validation click here.

Define design time validation versus run time validation.

Design time validation is performed when you are opening your package in BIDS whereas run time validation is performed when you are actually executing the package.

Define early validation (package level validation) versus late validation (component level validation).

When a package is executed, the package goes through the validation process.  All of the components/tasks of package are validated before actually starting the package execution. This is called early validation or package level validation. During execution of a package, SSIS validates the component/task again before executing that particular component/task.  This is called late validation or component level validation.

What is DelayValidation and what is the significance?

As I said before, during early validation all of the components of the package are validated along with the package itself. If any of the component/task fails to validate, SSIS will not start the package execution. In most cases this is fine, but what if the second task is dependent on the first task?  For example, say you are creating a table in the first task and referring to the same table in

Page 11: SSIS

the second task? When early validation starts, it will not be able to validate the second task as the dependent table has not been created yet.  Keep in mind that early validation is performed before the package execution starts. So what should we do in this case?  How can we ensure the package is executed successfully and the logically flow of the package is correct?  This is where you can use the DelayValidation property.  In the above scenario you should set the DelayValidation property of the second task to TRUE in which case early validation i.e. package level validation is skipped for that task and that task would only be validated during late validation i.e. component level validation. Please note using the DelayValidation property you can only skip early validation for that specific task, there is no way to skip late or component level validation.

What are the different components in the SSIS architecture?

The SSIS architecture comprises of four main components: o The SSIS runtime engine manages the workflow of the package o The data flow pipeline engine manages the flow of data from source to

destination and in-memory transformations o The SSIS object model is used for programmatically creating,

managing and monitoring SSIS packages o The SSIS windows service allows managing and monitoring packages

To learn more about the architecture click here.

How is SSIS runtime engine different from the SSIS dataflow pipeline engine?

The SSIS Runtime Engine manages the workflow of the packages during runtime, which means its role is to execute the tasks in a defined sequence.  As you know, you can define the sequence using precedence constraints. This engine is also responsible for providing support for event logging, breakpoints in the BIDS designer, package configuration, transactions and connections. The SSIS Runtime engine has been designed to support concurrent/parallel execution of tasks in the package.

The Dataflow Pipeline Engine is responsible for executing the data flow tasks of the package. It creates a dataflow pipeline by allocating in-memory structure for storing data in-transit. This means, the engine pulls data from source, stores it in memory, executes the required transformation in the data stored in memory and finally loads the data to the destination. Like the SSIS runtime engine, the Dataflow pipeline has been designed to do its work in parallel by creating multiple threads and enabling them to run multiple execution trees/units in parallel.

How is a synchronous (non-blocking) transformation different from an asynchronous (blocking) transformation in SQL Server Integration Services?

Page 12: SSIS

A transformation changes the data in the required format before loading it to the destination or passing the data down the path. The transformation can be categorized in Synchronous and Asynchronous transformation.

A transformation is called synchronous when it processes each incoming row (modify the data in required format in place only so that the layout of the result-set remains same) and passes them down the hierarchy/path. It means, output rows are synchronous with the input rows (1:1 relationship between input and output rows) and hence it uses the same allocated buffer set/memory and does not require additional memory. Please note, these kinds of transformations have lower memory requirements as they work on a row-by-row basis (and hence run quite faster) and do not block the data flow in the pipeline. Some of the examples are : Lookup, Derived Columns, Data Conversion, Copy column, Multicast, Row count transformations, etc.

A transformation is called Asynchronous when it requires all incoming rows to be stored locally in the memory before it can start producing output rows. For example, with an Aggregate Transformation, it requires all the rows to be loaded and stored in memory before it can aggregate and produce the output rows. This way you can see input rows are not in sync with output rows and more memory is required to store the whole set of data (no memory reuse) for both the data input and output. These kind of transformations have higher memory requirements (and there are high chances of buffer spooling to disk if insufficient memory is available) and generally runs slower. The asynchronous transformations are also called "blocking transformations" because of its nature of blocking the output rows unless all input rows are read into memory. To learn more about it click here.

What is the difference between a partially blocking transformation versus a fully blocking transformation in SQL Server Integration Services?

Asynchronous transformations, as discussed in last question, can be further divided in two categories depending on their blocking behavior:

o Partially Blocking Transformations do not block the output until a full read of the inputs occur.  However, they require new buffers/memory to be allocated to store the newly created result-set because the output from these kind of transformations differs from the input set. For example, Merge Join transformation joins two sorted inputs and produces a merged output. In this case if you notice, the data flow pipeline engine creates two input sets of memory, but the merged output from the transformation requires another set of output buffers as structure of the output rows which are different from the input rows. It means the memory requirement for this type of transformations is higher than synchronous transformations where the transformation is completed in place.

o Full Blocking Transformations, apart from requiring an additional set of output buffers, also blocks the output completely unless the whole input set is read. For example, the Sort Transformation requires all input rows to be available before it can start sorting and pass down the rows to the output path. These kind of transformations are most expensive and should be used only as needed. For example, if you can get sorted data from the source system, use that logic instead of using

Page 13: SSIS

a Sort transformation to sort the data in transit/memory. To learn more about it click here.

What is an SSIS execution tree and how can I analyze the execution trees of a data flow task?

The work to be done in the data flow task is divided into multiple chunks, which are called execution units, by the dataflow pipeline engine.  Each represents a group of transformations. The individual execution unit is called an execution tree, which can be executed by separate thread along with other execution trees in a parallel manner. The memory structure is also called a data buffer, which gets created by the data flow pipeline engine and has the scope of each individual execution tree. An execution tree normally starts at either the source or an asynchronous transformation and ends at the first asynchronous transformation or a destination. During execution of the execution tree, the source reads the data, then stores the data to a buffer, executes the transformation in the buffer and passes the buffer to the next execution tree in the path by passing the pointers to the buffers. To learn more about it click here.

To see how many execution trees are getting created and how many rows are getting stored in each buffer for a individual data flow task, you can enable logging of these events of data flow task: PipelineExecutionTrees, PipelineComponentTime, PipelineInitialization, BufferSizeTunning, etc. To learn more about events that can be logged click here.

How can an SSIS package be scheduled to execute at a defined time or at a defined interval per day?

You can configure a SQL Server Agent Job with a job step type of SQL Server Integration Services Package, the job invokes the dtexec command line utility internally to execute the package. You can run the job (and in turn the SSIS package) on demand or you can create a schedule for a one time need or on a reoccurring basis. Refer to this tip to learn more about it.

What is an SSIS Proxy account and why would you create it?

When we try to execute an SSIS package from a SQL Server Agent Job it fails with the message "Non-SysAdmins have been denied permission to run DTS Execution job steps without a proxy account". This error message is generated if the account under which SQL Server Agent Service is running and the job owner is not a sysadmin on the instance or the job step is not set to run under a proxy account associated with the SSIS subsystem. Refer to this tip to learn more about it.

How can you configure your SSIS package to run in 32-bit mode on 64-bit machine when using some data providers which are not available on the 64-bit platform?

Page 14: SSIS

In order to run an SSIS package in 32-bit mode the SSIS project property Run64BitRuntime needs to be set to "False".  The default configuration for this property is "True".  This configuration is an instruction to load the 32-bit runtime environment rather than 64-bit, and your packages will still run without any additional changes. The property can be found under SSIS

Project Property Pages -> Configuration Properties -> Debugging. Difference between control flow and data flow? Control flow deals with orderly processing of individual, isolated tasks, these

tasks are linked through precedence constraints in random order. Also the output for task has finite outcome i.e., Success, Failure, or Completion. A subsequent task does not initiate unless its predecessor has completed. Data flow, on the other hand, streams the data in pipeline manner from its source to a destination and modifying it in between by applying transformations. Another distinction between them is the absence of a mechanism that would allow direct transfer of data between individual control flow tasks. On the other hand, data flow lacks nesting capabilities provided by containers.

Control Flow Data FlowProcess Oriented Data Oriented

Made up of Tasks and ContainerSource, Transformation and Destination

Connected through

Precedence constraint Paths

Smallest unit Task Component

OutcomeFinite- Success, Failure, Completion

Not fixed

  If you want to send some data from Access database to SQL server

database. What are different component of SSIS will you use? In the data flow, we will use one OLE DB source, data conversion

transformation and one OLE DB destination or SQL server destination. OLE DB source is data source is useful for reading data from Oracle, SQL Server and Access databases. Data Conversion transformation would be needed to remove datatype abnormality since there is difference in datatype between the two databases (Access and SQL Server) mentioned. If our database server is stored on and package is run from same machine, we can use SQL Server destination otherwise we need to use OLE DB destination. The SQL Server destination is the destination that optimizes the SQL Server.  

Difference and similarity between merge and merge join transformation?

Merge TransofrmationsMerge Join Transformation

The data from 2 input paths are merged into one

The data from 2 inputs are merged based on some common key.

Page 15: SSIS

Works as UNION ALL JOIN (LEFT, RIGHT OR FULL)Supports 2 Datasets  1 Dataset

ColumnsMetadata for all columns needs to be same

Key columns metadata needs to be same.

Pre-requisites

Data must be sorted.

Merged columns should have same datatype i.e. if merged column is EmployeeName with string of 25 character in Input 1, it can be of less than or equal to 25 characters for merging to happen.

Data must be sorted.

Merged columns should have same datatype i.e. if merged column is EmployeeName with string of 25 character in Input 1, it can be of less than or equal to 25 characters for merging to happen.

Limitations

Only 2 input paths can be merged.

Does not support error handling.

Does not support error handling.

Use

Merging of data from 2 data source

Can create complex datasets using nesting merge transformation,

When data from 2 tables having foreign key relationship needs to present based on common key.

What is precedence constraint?

A precedence constraint is a link between 2 control flow tasks and lays down the condition on which the second task is run. They are used to control the workflow of the package. There are 3 kinds of precedence constraint – success (green arrow), failure (red arrow) or Completion script task (blue arrow). By default, when we add 2 tasks, it links by green arrow. The way the precedence constraint is evaluated can be based on outcome of the initial task. Also, we can add expression to evaluate such outcome. Any expression that can be judged as true or false can be used for such purpose. The precedence constraint is very useful in error handling in SSIS package.

Your browser may not support display of this image.

Explain why variables called the most powerful component of SSIS.

Variable allows us to dynamically control the package at runtime. Example: You have some custom code or script that determines the query parameter’s value. Now, we cannot have fixed value for query parameter. In such scenarios, we can

Page 16: SSIS

use variables and refer the variable to query parameter. We can use variables for like:1. updating the properties at runtime, 2. populating the query parameter value at runtime, 3. used in script task, 4. Error handling logic and 5. With various looping logic.

Can we add our custom code in SSIS?

We can customize SSIS through code by using Script Task. The main purpose of this task is to control the flow of the package. This is very useful in the scenario where the functionality you want to implement is not available in existing control flow item.

To add your own code:-

1. In control flow tab, drag and drop Script Task from toolbox. 2. Double click on script task to open and select edit to open Script task editor. 3. In script task editor, there are 3 main properties     i.) General – Here you can specify name and description     ii.) Script – through this we can add our code by clicking on Design Script button. The scripting language present is      VB.Net only.     iii.) Expression

What is conditional split?

As the name suggest, this transformation splits the data based on condition and route them to different path. The logic for this transformation is based on CASE statement. The condition for this transformation is an expression. This transformation also provides us with default output, where rows matching no condition are routed. Conditional split is useful in scenarios like Telecom industry data you want to divide the customer data on gender, condition would be:

GENDER == ‘F’

Explain the use of containers in SSIS and also their types.

Containers can be defined as objects that stores one or more tasks. The primary purpose of container is grouping logically related tasks. Once the task is placed into the containers, we can perform various operations such as looping on container level until the desired criterion is met. Nesting of container is allowed. Container is placed inside the control flow. There are 4 types of Container:-

Page 17: SSIS

1. Task Host container- Only one task is placed inside the container. This is default container. 2. Sequence Container – This container can be defined as subset of package control flow. 3. For loop container – Allows looping based on condition. Runs a control flow till condition is met. 4. For each loop container - Loop through container based on enumerator.

Why is the need for data conversion transformations?

This transformation converts the datatype of input columns to different datatype and then route the data to output columns. This transformation can be used to:1. Change the datatype 2. If datatype is string then for setting the column length 3. If datatype is numeric then for setting decimal precision.

This data conversion transformation is very useful where you want to merge the data from different source into one. This transformation can remove the abnormality of the data. Example à The Company’s offices are located at different part of world. Each office has separate attendance tracking system in place. Some offices stores data in Access database, some in Oracle and some in SQL Server. Now you want to take data from all the offices and merged into one system. Since the datatypes in all these databases vary, it would be difficult to perform merge directly. Using this transformation, we can normalize them into single datatype and perform merge.

Error Handling in SSIS?

An error handler allows us to create flows to handle errors in the package in quite an easy way. Through event handler tab, we can name the event on which we want to handle errors and the task that needs to be performed when such an error arises. We can also add sending mail functionality in event of any error through SMTP Task in Event handler. This is quite useful in event of any failure in office non-working hours. In Data flow, we can handle errors for each connection through following failure path or red arrow.

1) What is the control flow2) what is a data flow3) how do you do error handling in SSIS4) how do you do logging in ssis5) how do you deploy ssis packages.6) how do you schedule ssis packages to run on the fly7) how do you run stored procedure and get data8) A scenario: Want to insert a text file into database table, but during the upload want to change a column called as months - January, Feb, etc to a code, - 1,2,3.. .This code can be read from another database table called months. After the conversion of the data , upload the file. If there are any errors, write to error table. Then for all errors, read errors from database, create a file, and mail it to the supervisor. How would you accomplish this task in SSIS?

Page 18: SSIS

9)what are variables and what is variable scope ?AnswersFor Q 1 and 2: In SSIS a workflow is called a control-flow. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result.

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, you use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow.

SQL Server 2005 Integration Services (SSIS) provides three different types of control flow elements: containers that provide structures in packages, tasks that provide functionality, and precedence constraints that connect the executables, containers, and tasks into an ordered control flow.

A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Before you can add a data flow to a package, the package control flow must include a Data Flow task. The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package.

SQL Server 2005 Integration Services (SSIS) provides three different types of data flow components: sources, transformations, and destinations. Sources extract data from data stores such as tables and views in relational databases, files, and Analysis Services databases. Transformations modify, summarize, and clean data. Destinations load data into data stores or create in-memory datasets. Q3:When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.

For example, a data conversion fails because a column contains a string instead of a number, an insertion into a database column fails because the data is a date and the column has a numeric data type, or an expression fails to evaluate because a column value is zero, resulting in a mathematical operation that is not valid.

Errors typically fall into one the following categories:

-Data conversion errors, which occur if a conversion results in loss of significant digits, the loss of insignificant digits, and the truncation of

Page 19: SSIS

strings. Data conversion errors also occur if the requested conversion is not supported. -Expression evaluation errors, which occur if expressions that are evaluated at run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values. -Lookup errors, which occur if a lookup operation fails to locate a match in the lookup table.

Many data flow components support error outputs, which let you control how the component handles row-level errors in both incoming and outgoing data. You specify how the component behaves when truncation or an error occurs by setting options on individual columns in the input or output.

For example, you can specify that the component should fail if customer name data is truncated, but ignore errors on another column that contains less important data.

Q 4:SSIS includes logging features that write log entries when run-time events occur and can also write custom messages.

Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers. The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files.

Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.

To customize the logging of an event or custom message, Integration Services provides a schema of commonly logged information to include in log entries. The Integration Services log schema defines the information that you can log. You can select elements from the log schema for each log entry.

To enable logging in a package1. In Business Intelligence Development Studio, open the Integration Services project that contains the package you want.2. On the SSIS menu, click Logging. 3. Select a log provider in the Provider type list, and then click Add. Q 5 :

SQL Server 2005 Integration Services (SSIS) makes it simple to deploy packages to any computer. There are two steps in the package deployment process:-The first step is to build the Integration Services project to create a package deployment utility.

Page 20: SSIS

-The second step is to copy the deployment folder that was created when you built the Integration Services project to the target computer, and then run the Package Installation Wizard to install the packages.Q 9 :

Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions.

Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.

Scope :

A variable is created within the scope of a package or within the scope of a container, task, or event handler in the package. Because the package container is at the top of the container hierarchy, variables with package scope function like global variables and can be used by all containers in the package. Similarly, variables defined within the scope of a container such as a For Loop container can be used by all tasks or containers within the For Loop container.

Question 1 - True or False - Using a checkpoint file in SSIS is just like issuing the CHECKPOINT command against the relational engine. It commits all of the data to the database. False. SSIS provides a Checkpoint capability which allows a package to restart at the point of failure.

Question 2 - Can you explain the what the Import\Export tool does and the basic steps in the wizard? The Import\Export tool is accessible via BIDS or executing the dtswizard command. The tool identifies a data source and a destination to move data either within 1 database, between instances or even from a database to a file (or vice versa).

Question 3 - What are the command line tools to execute SQL Server Integration Services packages? DTSEXECUI - When this command line tool is run a user interface is loaded in order to configure each of the applicable parameters to execute an SSIS package. DTEXEC - This is a pure command line tool where all of the needed switches must be passed into the command for successful execution of

Page 21: SSIS

the SSIS package.

Question 4 - Can you explain the SQL Server Integration Services functionality in Management Studio? You have the ability to do the following: Login to the SQL Server Integration Services instance View the SSIS log View the packages that are currently running on that instance Browse the packages stored in MSDB or the file system Import or export packages Delete packages Run packages

Question 5 - Can you name some of the core SSIS components in the Business Intelligence Development Studio you work with on a regular basis when building an SSIS package? Connection Managers Control Flow Data Flow Event Handlers Variables window Toolbox window Output window Logging Package Configurations

Question Difficulty = Moderate

Question 1 - True or False: SSIS has a default means to log all records updated, deleted or inserted on a per table basis. False, but a custom solution can be built to meet these needs.

Question 2 - What is a breakpoint in SSIS? How is it setup? How do you disable it? A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package. 10 unique conditions exist for each breakpoint. Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the 'Edit Breakpoints...' option.

Question 3 - Can you name 5 or more of the native SSIS connection managers? OLEDB connection - Used to connect to any data source requiring an OLEDB connection (i.e., SQL Server 2000) Flat file connection - Used to make a connection to a single file in the File System. Required for reading information from a File System flat file

Page 22: SSIS

ADO.Net connection - Uses the .Net Provider to make a connection to SQL Server 2005 or other connection exposed through managed code (like C#) in a custom task Analysis Services connection - Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task File connection - Used to reference a file or folder. The options are to either use or create a file or folder Excel FTP HTTP MSMQ SMO SMTP SQLMobile WMI

Question 4 - How do you eliminate quotes from being uploaded from a flat file to SQL Server? In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included. Additional information: How to strip out double quotes from an import file in SQL Server Integration ServicesQuestion 5 - Can you name 5 or more of the main SSIS tool box widgets and their functionality? For Loop Container Foreach Loop Container Sequence Container ActiveX Script Task Analysis Services Execute DDL Task Analysis Services Processing Task Bulk Insert Task Data Flow Task Data Mining Query Task Execute DTS 2000 Package Task Execute Package Task Execute Process Task Execute SQL Task etc.

Question Difficulty = Difficult

Question 1 - Can you explain one approach to deploy an SSIS package? One option is to build a deployment manifest file in BIDS, then copy the directory to the applicable SQL Server then work through the steps of the package installation wizard A second option is using the dtutil utility to copy, paste, rename, delete an SSIS Package A third option is to login to SQL Server Integration Services via SQL Server

Page 23: SSIS

Management Studio then navigate to the 'Stored Packages' folder then right click on the one of the children folders or an SSIS package to access the 'Import Packages...' or 'Export Packages...'option. A fourth option in BIDS is to navigate to File | Save Copy of Package and complete the interface.

Question 2 - Can you explain how to setup a checkpoint file in SSIS? The following items need to be configured on the properties tab for SSIS package: CheckpointFileName - Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it's a good idea to use an expression that concatenates a path defined in a package variable and the package name. CheckpointUsage - Determines if/how checkpoints are used. Choose from these options: Never (default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist. SaveCheckpoints - Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

Question 3 - Can you explain different options for dynamic configurations in SSIS? Use an XML file Use custom variables Use a database per environment with the variables Use a centralized database with all variables

Question 4 - How do you upgrade an SSIS Package? Depending on the complexity of the package, one or two techniques are typically used: Recode the package based on the functionality in SQL Server DTS Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the package that is not accurate

Question 5 - Can you name five of the Perfmon counters for SSIS and the value they provide? SQLServer:SSIS Service SSIS Package Instances - Total number of simultaneous SSIS Packages running SQLServer:SSIS Pipeline BLOB bytes read - Total bytes read from binary large objects during the monitoring period. BLOB bytes written - Total bytes written to binary large objects during the monitoring period. BLOB files in use - Number of binary large objects files used during the data flow task during the monitoring period. Buffer memory - The amount of physical or virtual memory used by the data flow

Page 24: SSIS

task during the monitoring period. Buffers in use - The number of buffers in use during the data flow task during the monitoring period. Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring period. Flat buffer memory - The total number of blocks of memory in use by the data flow task during the monitoring period. Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time. Private buffer memory - The total amount of physical or virtual memory used by data transformation tasks in the data flow engine during the monitoring period. Private buffers in use - The number of blocks of memory in use by the transformations in the data flow task at a point in time. Rows read - Total number of input rows in use by the data flow task at a point in time. Rows written - Total number of output rows in use by the data flow task at a point in time.

1. Demonstrate or whiteboard how you would suggest using configuration files in packages.  Would you consider it a best practice to create a configuration file for each connection manager or one for the entire package?

There should be a single configuration file for each connection manager in your packages that stores their connection string information.  So if you have 6 connection managers then you have 6 config files.  You can use the same config file across all your packages that use the same connections. 

If you have a single config file that stores all your connection managers then all your packages must have contain the connection managers that are stored in that config file.  This means you may have to put connection managers in your package that you don’t even need.

2. Demonstrate or whiteboard how checkpoints work in a package.

When checkpoints are enabled on a package if the package fails it will save the point at which the package fails.  This way you can correct the problem then rerun from the point that it failed instead of rerunning the entire package.  The obvious benefit to this is if you load a million record file just before the package fails you don’t have to load it again.

3. Demonstrate or whiteboard using a loop in a package so each file in a directory with the .txt extension is loaded into a table.  Before demonstrating this tell which task/container accomplishes this and which enumerator will be used.  (Big hint on which task/container to use is that it requires and enumerator)

This would require a Foreach Loop using the Foreach File Enumerator.  Inside the Foreach Loop Editor you need to set a variable to store the directory of the files that will be looped through.  Next select the connection manager used to load the files and add an expression to the connection string property that uses the variable created in the Foreach Loop.

Page 25: SSIS

4. Demonstrate or whiteboard how transactions work in a package.

If transactions are enabled on your package and tasks then when the package fails it will rollback everything that occurred during the package. First make sure MSDTC (Microsoft Distributed Transaction Coordinator) is enabled in the Control Panel -> Administrative Tools -> Component Services. Transactions must be enabled not only on the package level but also on each task you want included as part of the transaction. To have the entire package in a transaction set TransactionOption at the package level to Required and each task to Supported.

5. If you have a package that runs fine in Business Intelligence Development Studio (BIDS) but fails when running from a SQL Agent Job what would be your first guess on what the problem is?

The account that runs SQL Agent Jobs likely doesn’t have the needed permissions for one of the connections in your package. Either elevate the account permissions or create a proxy account.

To create a proxy account you need to first create new credentials with the appropriate permissions. Next assign those credentials to a proxy account. When you run the job now you will select Run As the newly created proxy account.

6. What techniques would you consider to add auditing to your packages?  You’re required to log when a package fails and how many rows were extracted and loaded in your sources and destinations.

I like to create a database that is designated for package auditing. Track row counts coming from a source and which actually make it to a destination. Row counts and package execution should be all in one location and then optionally report off that database.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

7. What techniques would you consider to add notification to your packages?  You’re required to send emails to essential staff members immediately after a package fails.

This could either be set in the SQL Agent when the package runs or actually inside the package you could add a Send Mail Task in the Event Handlers to notify when a package fails.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

8. Demonstrate or whiteboard techniques you would use to for CDC (Change Data Capture)?  Tell how you would write a package that loads data but first detects if the data already exists, exists but has changes, or is brand new data for a destination.

Page 26: SSIS

If for some reason you’ve avoided using a whiteboard to show your ideas to this point then make sure you start on this question! For small amounts of data I may use the Slowly Changing Dimension.

More often than not the data is too large to use in such a slow transform. I prefer to do a lookup on the key of the target table and rows that don’t match are obviously new rows that can be inserted. If they do match it’s possible they are updates or duplicates. Determine this by using a conditional split comparing rows from the target to incoming rows. Send updates to a staging table that can then be updated in an Execute SQL Task.

Explain that putting updates in a staging table instead of updating using the OLE DB Command is much better for performance because the Execute SQL Task performs a bulk operation.

9. Explain what breakpoints are and how you would use them.

Breakpoints put pauses in your package. It’s a great tool for debugging a package because you can place a breakpoint on a task and it will pause the package based on execution events.

A reason in which I have used breakpoints is when I have a looping container and I want to see how my variables are changed by the loop. I would place a watch window on the package and type the variable name in. Set a break point on the container the stop after each iteration of the loop.

  GENERALSSIS Q UESTIONSQuestion: Which versions of SSIS have you used?Comment: Differences between 2005 and 2008 are not very big so 2005, 2008 or 2008 R2 experienceusually is very similar. The big difference is with 2000 which had DTS and it very different (SSIS is createdfrom scratch)Question: Have you used SSIS Framework?Comment: This is common term in SSIS world which just means that you have templates that are set up toperform routine tasks like logging, error handling etc. Yes answer would usually indicate experiencedperson, no answer is still fine if your project is not very mission critical.Question: Do you have experienced working with data warehouses?Comment: SSIS is in most cases used for data warehouses so knowledge of Data Warehouses Designs isvery useful.Question: How have you learnt SSIS (on the job, articles, books, conferences)Comment: The thing is that most people who read good

Page 27: SSIS

books have usually an advantage over those whohasn't because they know what they know and they know what they don't know (but they know it existsand is available)…. Blog/Articlesvery in quality so best practise articles is a big plus+, conferences can bealso a plus.Question: Do you have certificationsComment: This is rather disappointed point for me. Qualifications generally are welcome but unfortunatelymany people simply cheat. Companies run courses and then give questions and answers, or people findthem on the internet. I've met people who had certification but knew very little, I've met people veryexperienced and knowledgeable without certification and people who have done certification for their self-satisfaction and are experienced and knowledgeable. In other words be careful with certification…. It is easyto get a misleading impression so make sure you ask the best questions for the position you can.SSIS DEVELOPMENTQ UESTIONSQuestion: How many difference source and destinations have you used?It is very common to get all kinds of sources so the more the person worked with the better for you.Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, webdata scrapping.Question: What configuration options have you used?Comment: This is an important one. Configuration should always be dynamic and usually is done using XMLand/or Environment Variable and SQL Table with all configurations.Question: Gives examples where you used parameters?Comment: Typically you use parameters to filter data in datasets (or data on reports) but you can also usethem to restrict values like the well-known make->model->year example. You can also have hidden andinternal parameters which get very handy for more advanced stuff.Question: How to quickly load data into sql server table?Comment: Fast Load option. This option is not set by default so most developers know this answer asotherwise the load is very slow.

SSIS Interview Questions and Answers : Series 3

How to create the deployment utility?Deployment utility can be created by right click on solution explorer setting create deployment utility to true.

How to deploy the packages SSIS?When you build the package by clicking on Solution explorer it creates the package.dtsx file.

If you are creating deployment utilities for deploying the packages it will create a manifest file. This manifest file will be the deployment utility. Clicking on it you will

Page 28: SSIS

get option of deploying on database or deploying on some location (create the package file).

Where a package does deploys in Integration service database?In integration services database MSDB database hold entire packages deployed at database.

How to implement Transaction in SSIS package?SSIS provides the facility of transaction to maintain data integrity. To implement transaction there are three options.

You can find out these options in properties. The options are.

Supported, Required and not supported.

Required: If the properties set to require than container or task will start the transaction or if it’s already started it will join the transaction.

Supported: If it’s supported then it will be the part of the transaction.

Not supported: if it’s not supported then it will neither initiates or nor be part of transactions.

What are the checkpoints in SQL Server?In SSIS by using checkpoints package can restarted from the point of failure. It avoids all the containers or the part of package which are executed successfully to re execute.