Course 10058 - Implementing Data Flow in SQL Server Integration Services 2008

7/16/2019 Course 10058 - Implementing Data Flow in SQL Server Integration Services 2008

http://slidepdf.com/reader/full/course-10058-implementing-data-flow-in-sql-server-integration-services-2008 1/90

Implementing Data Flow in

SQL Server Integration

Services 2008Course 10058



i

Table of Contents

Defining Data Sources and Destinations .............................................................................................1

Introduction .............................................................................................................................................. 1

Lesson Introduction .............................................................................................................................. 1

Lesson Objectives .................................................................................................................................. 1

Introduction to Data Flows ....................................................................................................................... 2

Data Flow Sources ..................................................................................................................................... 3

Object Linking and Embedding Database (OLE DB) .............................................................................. 3

Flat file ................................................................................................................................................... 3

Raw file .................................................................................................................................................. 4

Excel ...................................................................................................................................................... 4

XML ....................................................................................................................................................... 5

ADO.NET (ActiveX Data Objects)........................................................................................................... 5

Data Flow Destinations ............................................................................................................................. 7

Valid Data Destinations ......................................................................................................................... 7

Invalid Data Destinations ...................................................................................................................... 7

Configuring Access and Excel Data Sources .............................................................................................. 8

Excel ...................................................................................................................................................... 8

Access .................................................................................................................................................... 8

Data Flow Paths ............................................................................................................................... 10

Introduction ............................................................................................................................................ 10

Lesson Introduction ............................................................................................................................ 10

Lesson Objectives ................................................................................................................................ 10

Introduction to Data Flow Paths ............................................................................................................. 11

Data Viewers ........................................................................................................................................... 12

Grid ...................................................................................................................................................... 12

Histogram ............................................................................................................................................ 12

Scatter Plot .......................................................................................................................................... 12

Column Chart ...................................................................................................................................... 12

Implementing Data Flow Transformations: Part 1 ............................................................................. 13



ii

Introduction ............................................................................................................................................ 13



Introduction to Transformations ............................................................................................................ 14

Data Formatting Transformations .......................................................................................................... 15

Character Map transformation ........................................................................................................... 15

Data Conversion transformation ........................................................................................................ 16

Sort transformation ............................................................................................................................ 17

Aggregate transformation................................................................................................................... 17

Column Transformations ........................................................................................................................ 18

Copy Column transformation ............................................................................................................. 18

Derived Column transformation ......................................................................................................... 18

Import Column transformation........................................................................................................... 18

Export Column transformation ........................................................................................................... 19

Multiple Data Flow Transformations ...................................................................................................... 21

Conditional Split transformation ........................................................................................................ 21

Multicast transformation .................................................................................................................... 21

Merge transformation ........................................................................................................................ 22

Merge Join transformation ................................................................................................................. 22

Union All transformation .................................................................................................................... 22

Custom Transformations ........................................................................................................................ 23

Script Component transformation ...................................................................................................... 23

OLE DB Command transformation ...................................................................................................... 24

Slowly Changing Dimension Transformation .......................................................................................... 25

Implementing Data Flow Transformations: Part 2 ............................................................................. 26

Introduction ............................................................................................................................................ 26



Creating a Lookup and Cache Transformation ....................................................................................... 27

Data Analysis Transformations ............................................................................................................... 28

Pivot transformation ........................................................................................................................... 28

Unpivot transformation ...................................................................................................................... 28



iii

Data Mining Query transformation .................................................................................................... 29

Data Sampling Transformations.............................................................................................................. 30

Percentage Sampling transformation ................................................................................................. 30

Row Sampling transformation ............................................................................................................ 30

Row Count transformation ................................................................................................................. 31

Audit Transformations ............................................................................................................................ 32

Fuzzy Transformations ............................................................................................................................ 33

Fuzzy Lookup ....................................................................................................................................... 33

Fuzzy Grouping .................................................................................................................................... 33

Term Transformations ............................................................................................................................ 35

Term Extraction transformation ......................................................................................................... 35

Term Lookup transformation .............................................................................................................. 35

Best Practices .................................................................................................................................. 37

Lab: Implementing Data Flow in SQL Server Integration Services 2008 ............................................... 38

Lab Overview .......................................................................................................................................... 38

Lab Introduction .................................................................................................................................. 38

Lab Objectives ..................................................................................................................................... 38

Scenario................................................................................................................................................... 39

Exercise Information ............................................................................................................................... 40

Exercise 1: Defining Data Sources and Destinations ........................................................................... 40

Exercise 2: Working with Data Flow Paths .......................................................................................... 40

Exercise 3: Implementing Data Flow Transformations ....................................................................... 40

Lab Instructions: Implementing Data Flow in SQL Server Integration Services 2008 ............................. 41

Exercise 1: Defining Data Sources and Destinations ........................................................................... 41

Exercise 2: Working with Data Flow Paths .......................................................................................... 43

Exercise 3: Implementing Data Flow Transformations ....................................................................... 46

Lab Review .............................................................................................................................................. 50

What is the purpose of Data Flow paths? ........................................................................................... 50

What kind of errors can be managed by the error output Data Flow path? ...................................... 50

What data types does the Export Column transformation manage? ................................................. 50

What is the difference between a Type 1 and a Type 2 Slowly Changing Dimension and how are

they represented in the Slowly Changing Dimension transformation? .............................................. 50



iv

What is the difference between a Lookup and a Fuzzy Lookup transformation? .............................. 50

Module Summary ............................................................................................................................ 51

Defining Data Sources and Destinations ................................................................................................. 51

Data Flow Paths ...................................................................................................................................... 51

Implementing Data Flow Transformations: Part 1 ................................................................................. 52

Implementing Data Flow Transformations: Part 2 ................................................................................. 53

Lab: Implementing Data Flow in SQL Server Integration Services 2008 ................................................. 53

Glossary........................................................................................................................................... 54



1

Defining Data Sources and Destinations

Introduction

Lesson Introduction

SSIS provides support for a wide range of data sources and destinations within a package. The starting

point of a Data Flow task is to define the data source. This informs the Data Flow task of the location of

the data that will be moved. Dependent on the data source used, different properties must be

configured. Understanding the properties that are available within a data source will help you configure

them efficiently.

Data source destinations are objects within the Data Flow task that must be configured separately to

data sources. Like data sources, they consist of properties that need to be configured to inform SSIS of

the destination that the data will be loaded into. There are also additional data destinations such as

Analysis Services.

Lesson Objectives

After completing this lesson, you will be able to:

Describe data flows.

Use data flow sources.

Use data flow destinations.

Configure OLE DB data source.

Configure Microsoft Office Access and Microsoft Office Excel data sources.



2

Introduction to Data Flows

Data flows are configured within the Data Flow task to determine the location of the source data the

destination that the data will be inserted into and optionally, any transformations that may be

performed on the data as it is being moved between the source and the destination.

SQL Server Integration Services starts by defining a data source. Depending on the data source chosen,different properties will have to be configured.

Typically, you would have to define connection information that would include the server name of the

source data and the database name if accessing a table within a database and the filename if the source

is a text or a raw file.

You can also define more than one data source.

You can then optionally add one or more transformations after the data source is defined.

Transformations are used to modify the data so that it can be standardized.

SQL Server Integration Services provides a wide variety of transformations to meet an organization’s

requirements.

Each transformation contains different properties to control how the data is changed.

You then define data destinations in which the transformed data is loaded into.

Like data sources, the properties that are configured will differ depending on the data destination

chosen and you are not limited to one data destination.

To connect data sources, transformations and data destinations together, you use Data Flow paths tocontrol the flow of the Data Flow tasks.



3

Data Flow Sources

SSIS provides a range of data source connections that you can use to access the source data from a wide

variety of technologies. Additional sources are also available for download such as Microsoft Connectors

for Oracle and TERADATA by Attunity and Microsoft SQL Server 2008 Feature Pack.

Object Linking and Embedding Database (OLE DB)

Using OLE DB, you can access the data that exists with SQL Server, Access and Excel. You can also

connect to OLE DB providers for third-party databases. With OLE DB, you can access data directly from

tables or views within a database. You can also use SQL statements to specifically target the data that

you wish to return and take advantage of SQL clauses, such as ORDER BY, to retrieve the data.

Furthermore, parameters can be defined in the SQL statement by using ? (question marks) and mapping

the parameter to SSIS variables. The following properties can be configured:

Connection Manager page. Here, you can define a connection to the server, the database and

the authentication by clicking the New button. The Data Access Mode has a list where you can

define how to access the data. The options in the list can include selecting Table or View, TableName or View name from a variable, a SQL Command or a SQL Command from a variable.

Depending on what is selected, the options can change whereby you can select a specific table,

view, variable or you can manually type the SQL command. There is also a Preview button to

view the data.

Columns page. You can use this page to view the Available External Columns so you can choose

which columns is a part of the data source. They will appear under the External Columns if

selected. You can also rename the output of the column by typing in a different column name in

the Output Column list.

Error Output page. You can use this page to control the error handling options. Should the data

fail, you can ignore the failure, redirect the row or fail the component. This can be specified if

the error is caused by data truncation or general data errors. The Column property lists the

columns that are a part of the data source and you can add an optional description.

Flat file

You can connect to text files by using the Flat file data source connection. This allows you to control how

the text file is structured by defining the column and row delimiter. You can also define if the first row

contains headers and provide information about the width of the columns and the locale of the text file.

The following properties can be configured:

Connection Manager page. Here, you can define a connection to the text file by clicking the New

button. This opens up a Flat File Connection Manager Editor, where you can define the location

of the text file, the column and row delimiter, whether the text is qualified, the locale of the text

file and whether the first row contains headings. Once defined, you can preview the data by

clicking the Preview button. You can also specify whether null columns in the text file are

retained by selecting the check box next to Retain null values from the data source as null values

in the data flow.

Columns page. This page enables you to view the Available External Columns so you can choose

which columns is a part of the data source. If selected, they appear under the External Columns.



4

You can also rename the output of the column by typing in a different column name in the

Output Column list.




columns that are part of the data source and you can add an optional description.

In the advance properties, the Fast Parse property provides a fast, simple set of routines for parsing

data. These routines are not locale-sensitive and they support only a subset of date, time and integer

formats. By implementing Fast Parse, a package forfeits its ability to interpret date, time and numeric

data in locale-specific formats.

Raw file

The Raw file data flow source is used to retrieve raw data that was previously written by the Raw File

Destination and allows for fast reading and writing of data and are typically used as an intermediary

data file in a larger data load operation. The Raw file source has less configuration options than the text

file, so no translation of the data is required providing the speed of data extraction. There is no error

output page for this data source, so there is little parsing of the data required. The following properties

can be configured:

Connection Manager page. Here, you can define a connection to the raw file by firstly specify

the Access mode; this can either be a filename or a filename from a variable. If Filename is

selected, you can then browse to the Raw file in the file system. If Filename from Variable is

selected, you can select the variable from a drop-down list.

Columns page. This page enables you to view the Available External Columns so that you can

choose which columns is a part of the data source. If selected, they appear under the External

Columns. You can also rename the output of the column by typing in a different column name in


Excel

Excel 2007 requires the OLE DB provider for the Microsoft Office 12.0 Access Database Engine OLE DB.

For earlier versions of Excel, use the Excel Source data source component. The options are similar to the

OLE DB data source, except that you point the connection manager to the Excel file. Any named ranges

that are defined in Excel are the equivalent of tables and views. The following properties can be

configured:

Connection Manager page. Here, you can define a connection to the Excel file by clicking the

New button and browsing to the Excel file in the Excel Connection Manager dialog box. The DataAccess Mode has a list where you can define how to access the data. The list can include

selecting Table or View, Table Name or View name from a variable, a SQL Command or a SQL

Command from a variable. Depending on what is selected, the options can change whereby you

can select a specific table, view, variable or you can manually type in the SQL command by using

the worksheet name as the equivalent to a table name in the FROM clause. There is also a

Preview button to view the data.



5

Columns page. You can use this page to view the Available External Columns so that you can

choose which columns is a part of the data source. They appear under the External Columns, if







XML

The XML data source helps you retrieve data from an XML source document. You can also include that

the data is read from a schema that is either an inline schema or a separate XML Schema Definition

(XSD) file to ensure that the content of the XML meets with the data integrity checks within the schema.

Data Type Definition (DTD) files are not supported. Schemas can support a single namespace, and does

not support schema collections. The XML source does not validate the data in the XML file against the

XSD file. The following properties can be configured:

Connection Manager page. The Data Access Mode has a list where you can define how to access

the XML data. The list can include selecting XML File Location, XML file from a variable or an

XML Data from a variable. Depending on what is selected, the options can change whereby you

can select a specific file or variable from the list below the Data Access Mode. You can also

define if the XML file or fragment works in conjunction with an XSD file. This can either be

located in the existing XML data, in which case you can select the Use inline Schema check box

or you can refer to a separate XSD file by clicking on the Browse button next to the XSD Location

box. There is also a Preview button to view the data.


choose which columns is part of the data source. They appear under the External Columns, if

selected. You can also rename the output of the column by typing in a different column name inthe Output Column list.





ADO.NET (ActiveX Data Objects)

You can use the ADO.NET source to connect to a database and retrieve data by using .NET. The options

that are available within the ADO.NET data source are very similar to the OLE DB data source and can

access the .NET provider for OLE DB to create a datareader, which enables you to have a single row of data loaded into memory. However, unlike the OLE DB data source, the ADO.NET data source can also

access non-OLE DB connections like .NET providers for ODBC data providers. The following properties

can be configured:

Connection Manager page. Here, you can define a connection to the server, the database and

the authentication by clicking the New button. The Data Access Mode has a drop-down list

where you can define how to access the data, which can include selecting Table or View, Table



6

Name or View name from a variable, a SQL Command or a SQL Command from a variable.

Depending on what is selected, the options can change whereby you can select a specific table,

view, variable or you can manually type in the SQL command. There is also a Preview button to

view the data.


choose which columns is part of the data source. They appear under the External Columns, if









7

Data Flow Destinations

Valid Data Destinations

Excel

Recordset

Flat file

SQL Server

OLE DB

SQL Server compact

ADO.NET

Raw file

SQL Server Analysis Services (SSAS) partition

SSAS dimension

SSAS data mining training model

Invalid Data Destinations

SQL Server Reporting Services (SSRS)

Access

XML



8

Configuring Access and Excel Data Sources

Prior to working with the data sources in the Data Flow task, connection managers are created first so

that they can easily be used within the Data Flow task. There are considerations to be mindful of when

using Access and Excel in your SSIS package.

Excel

To connect to Excel, it is important to understand that different connection managers are used

depending on the version of Excel that you are connecting to. To connect to a workbook in Excel 2003 or

an earlier version of Excel, you must create an Excel connection manager from the Connection Managers

area.

To create an Excel connection manager, perform the following steps:

1. In Business Intelligence Development Studio, open the package.

2. In the Connections Managers area, right-click anywhere in the area, and then select New

Connection.3. In the Add SSIS Connection Manager dialog box, select Excel, and then configure the connection

manager.

To connect to a workbook in Excel 2007, you must create an OLE DB connection manager from the

Connection Managers area.

To create an OLE DB connection manager, perform the following steps:


2. In the Connections Managers area, right-click anywhere in the area, and then select New OLE

DB Connection.

3. In the Configure OLE DB Connection Manager dialog box, click New.

4. In the Connection Manager dialog box, for Provider, select Microsoft Office 12.0 Access

Database Engine OLE DB.

Access

To connect to Access, it is important to understand that different connection managers are used

depending on the version of Access that you are connecting to. If you want to connect to a data source

in Access 2003 or an earlier version of Access, you must create an Access connection manager from the

Connection Managers area.

To create an Access connection manager, perform the following steps:



DB Connection.


4. In the Connection Manager dialog box, for Provider, select Microsoft Jet 4.0 OLE DB Provider,

and then configure the connection manager as appropriate.



9

If you want to connect to a data source in Access 2007, you must create an OLE DB connection manager

from the Connection Managers area. To create an OLE DB connection manager, perform the following

steps:



DB Connection.


4. In the Connection Manager dialog box, for Provider, select Microsoft Office 12.0 Access

Database Engine OLE DB, and then configure the connection manager as appropriate.



10

Data Flow Paths

Introduction

Lesson Introduction

Data Flow paths are similar to Control Flow paths in that they control the flow of data within a Data Flowtask. Data Flow paths can be simply used to connect a data source directly to a data destination.

Typically, you use a Data Flow path to determine the order in which a transformation takes place;

specifying the path that is taken should a transformation succeed or fail. This provides the ability to

separate the data that cause errors from the data that is being successfully transformed.

You can add data viewers to the Data Flow path. This enables you to get a snapshot of the data that is

being transformed. This is useful when developing packages when you wish to see the data before and

after it is transformed.

Lesson Objectives


Describe Data Flow paths.

Configure a data flow path.

Describe a data viewer.

Use a data viewer.



11

Introduction to Data Flow Paths

Data Flow paths play an important role in controlling the order that data is transformed between a

source connection and the destination connection.

Here you can control the flow of the data flow when a Data Flow component executes successfully, and

control the flow of the data when the Data Flow component fails. This enables you to create robust dataflows.

When a data source or transformation is added to the Data Flow Designer, a green arrow appears

underneath the Data Flow component.

You can click and drag the arrow to connect it another Data Flow component.

This will indicate that on successful execution of the first Data Flow component, the data flow can

provide input data to the next Data Flow component.

When this is done, a red arrow will appear under the original Data Flow component.

You can click and drag this to another Data Flow component, typically a data destination.

This will indicate an error output failure of the Data Flow component.

The data flow can provide a data flow input to the next Data Flow component that it is connected to.

In this manner, you can control the workflow of the Data Flow tasks by using the Data Flow paths.

The Data Flow paths can be configured by double-clicking on a Data Flow path. Properties can include

name and description.

You can also view the metadata of the data that is involved in the data flow.

Data viewers can also be configured so that you can view the data as it is passing through the data flow.



12

Data Viewers

A data viewer is a useful debugging tool that enables you to view the data as it passes through the data

flow between two data flow components. You can apply data viewers to any data flow path so that you

can view the state of the data at each stage of the Data Flow task. Data viewers provide four different

methods for viewing the data.

A data viewer window shows data one buffer at a time. By default, the data flow pipeline limits buffers

to about 10,000 rows. If the data flow extracts more than 10,000 rows, it will pass that data through the

pipeline in multiple buffers. For example, if the data flow is extracting 25,000 rows, the first two buffers

will contain about 10,000 rows, and the third buffer will contain about 5,000 rows. You can advance to

the next buffer by clicking the green arrow in the data flow window.

Grid

The Grid data viewer type returns the data in rows and columns in a table. This is useful if you want to

view the impact that a transformation has had on the data.

The data viewer allows you to copy the data within the data viewer so that it can be stored in a separate

file such as an Excel file.

Histogram

Working with numeric data only, the Histogram data viewer type allows you to select one column from

the data flow. The histogram then displays the distribution of numeric data within the specified column.

This is useful if you wish to view the frequency that particular numeric values have within a specific

column. You can also copy the results to an external file.

Scatter Plot

The Scatter Plot data viewer type allows you to select two numeric columns from a data source. This

information is then plotted on the X-axis and Y-axis of a chart.

With this data viewer, you can see how the numeric data from the two columns are related to each

other. This information can be copied to an external file.

Column Chart

The Column Chart data viewer type allows you to select one column from the data flow. This presents a

column chart that shows the number of occurrences of a value within the data.

This can provide an indication of the data values that are stored within the data. The result can be

copied to an external file.



13

Implementing Data Flow Transformations: Part 1

Introduction

Lesson Introduction

Data Flow transformations have the ability to ensure that your BI solution provides one version of thetruth when it comes to providing the data to the data warehouse. The transformations can be used to

change the format of data, sort and group data and perform custom transformations that will ensure

that the data is placed within the data warehouse as standardized format that can then be consumed

within Analysis Services as a cube, Reporting Services as reports or a combination of both.

Understanding the capabilities of the many transformations that are available will aid you in building a

trusted data warehouse.

Lesson Objectives


Describe transformations.

Use data formatting transformations.

Use column transformations.

Use multiple Data Flow transformations.

Use custom transformations.

Implement transformations.

Use Slowly Changing Dimension transformation.



14

Introduction to Transformations

Transformations are the unique aspect of SQL Server Integration Services within SQL Server that allows

you to change the data as the data is being moved from a source connection to a destination connection

such as a text file to a table within a database.

Transformations can be simple such as performing a straight copy of the data between a source and adestination.

It can be complex such as performing fuzzy lookups on the data being moved.

However, all can be used to standardize and cleanse the data; an important objective when loading a

data warehouse with data.



15

Data Formatting Transformations

Data formatting transformations convert data as it passes through the data flow. By using these

transformations, you can change data types, adjust value lengths, convert values to a different case or

perform a number of other operations. Sorting and grouping transformations reorganize data as it

passes through the data flow.

Character Map transformation

The Character Map transformation applies string operations to the data. For example, you can convert

data from lowercase to uppercase for a State column in a customer’s table. The transformation can be

performed in place or a new output column can be generated from the character map conversion.

Mapping Operations with the Character Map Transformation

The following table describes the mapping operations that the Character Map transformationsupports.

Value Description

Lowercase Convert to lower case.

Uppercase Convert to upper case.

Byte reversal Convert by reversing byte order.

Hiragana Convert Japanese katakana characters to hiragana.

Katakana Convert Japanese hiragana characters to katakana.

Half width Convert full-width characters to half-width.

Full width Convert half-width characters to full-width.

Linguistic casingApply linguistic rules of casing (Unicode simple case mapping forTurkic and other locales) instead of the system rules.

Simplified Chinese Convert traditional Chinese characters to simplified Chinese.

Traditional Chinese Convert simplified Chinese characters to traditional Chinese.



16

Mutually Exclusive Mapping Operations

More than one operation can be performed in a transformation. However, some mapping

operations are mutually exclusive. The following table lists restrictions that apply when you use

multiple operations on the same column. Operations in the columns Operation A and Operation

B are mutually exclusive.

Operation A Operation B

Lowercase Uppercase

Hiragana Katakana

Half width Full width

Traditional Chinese Simplified Chinese

Lowercase Hiragana, katakana, half-width, full-width

Uppercase Hiragana, katakana, half-width, full-width

You use the Character Map Transformation Editor dialog box to make the changes by using the following

properties:

Available Input Columns. The Available Input Columns enables you to select the columns that

the operation will affect. When a column is selected, it appears in the Input Columns list.

Destination column. You use the Destination column to determine if the change will generate a

new column or the change is an in-place change.

Operation column. The Operation column provides a drop-down list to specify the operationthat occurs on the data such as Uppercase.

Output Alias column. The Output Alias column allows you to name the column name for a new

column destination or retains the same column name for transformations that are an in-place

change.

Data Conversion transformation

The Data Conversion transformation converts data from one data type to another during the data flow

and creates a new column with the new data. This can be useful when data is extracted from different

data sources and needs standardizing before being loaded into a single destination. Like the Character

Map transformation, this may cause some of the data to be truncated; you can use the Configure Error

Output option to handle such types of errors.

The Data Conversion task can be configured by using the following properties:

Available Input Columns. The Available Input Columns enables you to select the columns that

the operation will affect; when a column is selected, it appears in the Input Columns list.



17

Output Alias column. The Output Alias column allows you to define a name for the new column.

You can then set the DataType, Length, Precision and Scale for the data to be converted.

Furthermore, the Code Page is used to define the code page for any columns that use the

DT_STR data type.

Sort transformation

The Sort transformation take data from an input and then sorts the data in ascending or descending

order when passed to the output. The Sort transformation can perform multiple sorts on different

columns within the same transformation and duplicate values can be removed from the Sort operation.

Any columns that are not part of the Sort operation are passed through to the transformation output.

Within the Sort Transformation Editor dialog box, the Available Input Columns enables you to select the

columns that the operation will affect. When a column is selected, it appears in the Input Columns list.

The Output alias defines the name of the output column, which is the same name as the input column

name. The Sort Type property determines if the Sort operation is ascending or descending and the Sort

Order property control which column is sorted first when multiple columns are defined. The lowest

number specified is the first column to be sorted. Comparison Flags can be set to ignore case and ignore

character width. To remove duplicate values, ensure that the Remove rows with duplicate sort values

check box is selected.

The Sort transformation does not support Error Output configuration.

Aggregate transformation

Not only does the Aggregate transformation apply aggregate functions to a set of numeric data to create

a new transformation output, it can also use the Transact-SQL Group By clause, which allows you to

apply aggregate functions to groups of data.

The Aggregate Transformation Editor dialog box contains two tabs that contain properties.

On the Aggregations tab, the Available Input Columns enables you to select the columns that the

operation will affect. When a column is selected, it appears in the Input Columns list. The Output alias

defines the name of the output column. The Operation column determines the aggregate function that

is used or the Group By operator can be defined. Comparison flags can be configured to refine the data

that is aggregated such as ignore spacing.

The Count Distinct Scale property can be used to count the approximate number of distinct values and

Count Distinct Keys properties can be used to provide an exact count of the distinct values.

Alternatively, by clicking the Advanced button, you can use the Key property to provide an exact countof the distinct values or Key Scales to provide an approximate count of the distinct values. These values

can be used to improve performance of the Aggregate transformation. This can be configured in the

Advanced tab as well.

The Aggregate transformation does not support Error Output.



18

Column Transformations

Column transformations copy and create columns in the data flow. The transformations enable you to

import large files, such as images or documents, into the data flow or export the same to a file.

Copy Column transformation

The Copy Column transformation takes a data flow input and creates a new column as the

transformation output. You have the ability to create multiple copies of the same column.

The Copy Column Transformation Editor dialog box consists of the Available Input Columns property

that enables you to select the columns, which the Copy Column operation will affect. When a column is

selected, it appears in the Input Columns list. The Output alias allows you to define the name of the

output column.

The Copy Column transformation does not support Error Output configuration.

Derived Column transformation

The Derived Column transformation allows you to create a new column or replace values in an existing

column by using expressions to create a new column derived from a combination of variables, functions,

operators and columns from the transformation input. You can use this task to concatenate columns,

use functions to extrapolate information from existing input columns and perform mathematical

calculations.

The Derived Column Transformation Editor dialog box contains an expression editor used to create

expressions within the Expression property. The Derived Column property allows you to determine if the

operation will create a New Column or replace values in an Existing column. This setting affects the

Derived Column Name property that allows you to specify the name for the column. You can then set

the DataType, Length, Precision and Scale for the data to be derived. Furthermore, the Code Page is

used to define the code page for any columns that use the DT_STR data type.

The Derived Column transformation may cause some of the data to be truncated; you can use the

Configure Error Output to handle such types of errors.

Import Column transformation

The Import Column transformation reads data from a file and imports it to a column in the data flow.

This transformation does the opposite of the Export Column transformation by adding text and images

stored in separate files to a data flow.

The Import Column Transformation task contains three tabs:

Component Properties tab. The Component Properties tab allows you to define a Name and

Description for the task and configure the locale for the task by using the LocaleID property. The

ValidateExternalMetadata defines whether the transformation is validated against external data

during its design or when it is executed.



19

Input Columns tab. The Input Columns tab consists of the Available Input Columns property that

enables you to select the columns that the copy column operation will affect. When a column is

selected, it appears in the Input Columns list. The Output alias allows you to define the name of

the output column. The Usage Type property defines if the data imported is READONLY data or

READWRITE data.

Input and Output Properties tab. The Input and Output Properties tab enables you to configure

additional properties for the input and output columns.

Export Column transformation

The Export Column transformation allows you to export images and documents that are stored within

the data flow and export them to a file. Specifically, the data types that can be exported to the file

include DT_IMAGE, DT_TEXT and DT_NTEXT.

The Export Column Transformation Editor dialog box contains the following properties. The Extract

Column property allows you to select the input column to be transferred. The File Path Column must

point to a column within the input columns that specifies the file name. Both of these properties are

mandatory. You can then use the Allow Append and Force Truncate check boxes to determine if a new

file with the images are created or an existing file is used, if present.

How the settings for the Append and Truncate options affect results

Append Truncate File exists Results

False False NoThe transformation creates a new file andwrites the data to the file.

True False NoThe transformation creates a new file andwrites the data to the file.

False True NoThe transformation creates a new file andwrites the data to the file.

True True NoThe transformation fails design timevalidation. It is not valid to set bothproperties to True.

False False YesA run-time error occurs. The file exists, butthe transformation cannot write to it.

False True YesThe transformation deletes and re-creates thefile and writes the data to the file.

True False YesThe transformation opens the file and writesthe data at the end of the file.

True True YesThe transformation fails design time

validation. It is not valid to set bothproperties to True.



20

The Write Byte-Order Mark property specifies whether to write a byte-order mark (BOM) to the file. A

BOM is only written if the data has the DT_NTEXT or DT_WSTR data type and is not appended to an

existing data file.



21

Multiple Data Flow Transformations

Multiple Data Flow transformations enable you to take a data input and separate the data based on an

expression. For example, in the Conditional Split transformation, if your data flow includes employee

information, you can split the data flow according to the cities in which the employees work. Multiple

Data Flow transformations also enables you to join data together. For example, you can bring data

together from separate data sources by using transformations such as Merge or Union All

transformations.

Conditional Split transformation

The Conditional Split transformation takes a single data flow input and creates multiple data flow

outputs based on multiple conditional expressions defined within the transformation. The order of the

conditional expression is important. If a record satisfies the first condition, the data is moved based on

that condition even if it meets the condition of the second expression. There, the record will no longer

be available to be evaluated against the second condition. Expression can be a combination of functions

and operators to define a condition.

The Conditional Split Transformation Editor dialog box contains an expression editor and a number of

properties that can be used to configure the conditional split. The Order property determines the order

in which the condition is evaluated. You can then provide Output Name for the data that is outputted by

the condition. The Condition property allows you to define an expression that defines the condition.

Examples include:

SUBSTRING(FirstName,1,1) == "A"

TerritoryID == 1

You can use the Configure Error Output to handle errors. Multicast transformation

The Multicast transformation allows you to output multiple copies of the same data flow input to

different data flow outputs. This transformation can be useful when you wish to output the same data

that will be transformed further down the data flow. For example, one output may then be summarized

using an aggregate transformation. The other output used as a basis to provide more detailed

information in a separate data flow.

The properties of the Multicast Transformation Editor dialog box can only be viewed once the outputs of

the transformation have been configured. Within the Editor, you are presented with an Outputs pane onthe left, which shows you the outputs the Multicast transform is generating. By selecting an output, the

Properties pane shows read-only information such as Identification String and ID property. The only

properties that you can change are the Name and Description properties.

The Multicast transformation does not support Error Output configuration.



22

Merge transformation

The Merge transformation takes multiple inputs into the transformation and merges the data together

from the separate inputs. A prerequisite to the merge input working successfully is that the input

columns are sorted. Furthermore, the columns that are sorted must also be of compatible data types.

For example, you cannot merge the input that has a character data type with a second input that has anumeric data type.

The Merge Transformation Editor dialog box consists of a number of columns dependent on how many

inputs are connected to the Merge transformation. For example, if three inputs are defined, then four

columns will appear; if two inputs are defined, then three columns appear and so on. The first column is

the Output column that allows you to define a name for the output data flow. The second column is

called Merge Input 1. In this column, you map the input column to the output column. The third column

is called Merge Input 2; again, you map the input column to the output column. If more input columns

are defined, the number of Merge Input columns increase.

The Merge transformation does not support Error Output configuration.

Merge Join transformation

The Merge Join transformation is similar to the Merge transformation. However, you can make use of

the following Transact-SQL clauses to determine how the data is merged. The Transact-SQL clauses

include FULL, LEFT or INNER join. Like the Merge transformation, the input columns must be sorted and

the columns that are joined must have compatible data types. You must also specify the type of join the

Merge transformation will use and how it will handle nulls in the data.

The Merge Join Transformation Editor dialog box has at the top a Join Type drop-down list that allows

you to specify the type of join that will be used in the transformation. The Input property enables you to

select the columns that the Merge Join transformation operation will affect. When a column is selected,it appears in the Input Columns list and the Input column determine from which data flow input the data

is from. The Output alias allows you to define the name of the data flow output.

Union All transformation

The Union All transformation is very similar to the Merge transformation. The key difference is that the

Union All transformation does not require the input columns to be sorted. However, the columns that

are mapped must still have compatible data types.

The Union All Transformation Editor dialog box consists of a number of columns that are dependent on

how many inputs are connected to the Union All transformation. For example, if three inputs aredefined, then four columns will appear; if two inputs are defined, then three columns appear and so on.

The first column is the Output column that allows you to define a name for the output data flow. The

second column is called Union All Input 1. In this column, you map the input column to the output

column. The third column is called Union All Input 2; again, you map the input column to the output

column. If more input columns are defined the number of Union All Input columns increase.

The Union All transformation does not support Error Output configuration.



23

Custom Transformations

Many of the transformations that are provided within SSIS will meet many of your business

requirements when performing ETL operations. There may be situations when the transformations

provided may not provide a solution. You can use the Script transformation to create custom

transformations by using .NET. The OLE DB Command transformation allows you to apply Transact-SQL

statements to data within a Data Flow path.

Script Component transformation

The Script Component transformation enables you to add custom data sources, transformations and

destinations by using .NET code, which can be programmed in Visual Basic (VB) 2008 or Visual C# 2008.

It is similar to the Script task within the control flow of an SSIS package but is used within the Data Flow

task.

In order to use the Script task, the local machine on which the package runs must have Microsoft Visual

Studio Tools for Applications installed. This provides a rich environment for building the custom scriptsincluding IntelliSense and its own Object Explorer. You can access Microsoft Visual Studio Tools for

Applications from within the Script Component on the Script page by clicking the Edit Script button. It is

also where you can define the Scripting Language. The Script page also allows you to specify a Name and

Description for the OLE DB Command task. You can also specify a locale with the LocaleID property and

whether the data flow is validated at run time or design time by using the ValidateExternalMetadata

property. You can also specify ReadOnlyVariables and ReadWriteVariables that are available to the

Script Component.

When the Script Component is added to the data flow, you are first prompted to select the Script

Component Type. This will determine if the Script Component is used as a Source, a Transformation or a

Destination and will affect the Script Component Editor. The following properties can be configured:

Input Columns tab. The Input Columns tab consists of the Input Name to determine the data

flow input to use. The Available Input Columns property that enables you to select the columns

which the Script Component operation will affect, when a column is selected it appears in the

Input Columns list. The Output alias allows you to define the name of the output column. The

Usage Type property defines if the data imported is READONLY data or READWRITE data.

Input and Output Properties tab. The Input and Output Properties tab allows you to set the

properties of the input and the output columns.

Connections Manager tab. The Connections Manager tab allows you to define connection

information that is used by the Script Component. This will include a Name and Description

property for the connection. The Connections Manager property allows you to select a

predefined connection manager or Add or Remove connection managers.

Note that the Script Component does not support error outputs.



24

OLE DB Command transformation

The OLE DB Command transformation enables you to apply SQL statements to each row within the data

flow. The SQL statement can include data manipulation statements such as INSERT, UPDATE and

DELETE. The SQL statement can accept parameters that are represented as ? (question marks) within

the SQL statement. Each question mark will be called param_0, param_1 and so on. You can use the OLEDB Command transformation to make changes to the data as it passes through the data flow. For

example, a change in the tax rate for selling products can be updated by using the OLE DB Command

transformation as the data runs through the data flow. The changed data becomes the output of the

OLE DB Command transformation.

The Advanced Editor for OLE DB Command dialog box contains four tabs that allow you to configure the

transformation:

Connections Manager tab. The Connections Manager tab allows you to define connection

information that is used within the data flow. This includes a Name and Description property for

the connection. The Connections Manager property allows you to select a predefined

connection manager.

Component Properties tab. The Component Properties tab allows you to specify a Name and

Description for the OLE DB Command task. You can also specify a locale with the LocaleID

property and whether the data flow is validated at run time or design time by using the

ValidateExternalMetadata property. In this same area, the SQLCommand property is where the

SQL statement is defined. You can use property expression to define the content of the

SQLCommand property as well. The CommandTimeout defines the number of seconds the

command has to run and the DefaultCodePage property sets the code page for the SQL

statement.

Column Mappings tab. The Column Mappings tab allows you to map the columns from the data

flow input to the parameters that are defined in the SQLCommand property. This is done by

mapping the Available Input Columns to the Destination Columns. Input and Output Properties tab. The Input and Output Properties tab allows you to set the




25

Slowly Changing Dimension Transformation

The Slowly Changing Dimension transformation performs a very important role when loading and

updating data within a dimension table within a data warehouse. Through the Slowly Changing

Dimension transformation, you can manage changes to the data.

Some of the data within a dimension data may remain static. As such, you can define this data as a fixedattribute. Any changes that occur to this data will be treated as an error.

The Slowly Changing Dimension transformation supports two types of Slowly Changing Dimension. Type

1 Slowly Changing Dimension is an overwrite of the original data. This is referred to as a changing

attribute within the wizard. Here, no historical content is retained and this is useful to overwrite invalid

data values.

Type 2 Slowly Changing Dimension is referred to as a historical changing attribute. Here, changing data

will generate a new row of data. The business key will be used to identify that the records are related.

The use of a start and end date is also used to indicate which record is the current record.

The Type 3 Slowly Changing Dimension will make use of an additional attribute within the record to

identify a records original value and an attribute for the most recent value. This is not supported directly

by the Slowly Changing Dimension Wizard. To overcome this, you can use a Slowly Changing Dimension

to identify a Type 3 column as fixed. On the output of these columns, you can then perform inserts and

updates on the column to perform Type 3 updates.

The Slowly Changing Dimension transformation task makes the process of managing dimension data

within a data warehouse straightforward.



26


Introduction

Lesson Introduction

Data Flow transformations can go beyond changing data by providing transformations that can performdata analysis, sampling and auditing.

Lesson Objectives


Use Lookup and Cache transformation.

Use data analysis transformations.

Use data sampling transformations.

Use monitoring transformations.

Use fuzzy transformations. Use term transformations.



27

Creating a Lookup and Cache Transformation

The Lookup transformation enables you to take information from an input column and then look up

additional information from another dataset that is linked to the input columns through a common

column. The dataset can be a table, view, SQL query or a cache file.

The Cache transformation has been introduced in SQL Server 2008. The Cache transformation can beused to improve the performance of a Lookup transformation by connecting to a data source and

population a cache file on the server on which the package runs. This means that the Lookup

transformation performs its lookup against the cache file rather than to a remote dataset. The Cache

transformation requires a connection manager to point to the .cache file and contains a Mappings tab

where you can map the input columns to the cache file. Note that one of the columns must be marked

as an index column.



28

Data Analysis Transformations

SSIS provides a range of data transformations that enables you to analyze data, as shown in the table

below.

Pivot transformation

The Pivot transformation takes data from a normalized result set and presents the data in a cross

tabulated or denormalized structure. For example, a normalized Orders data set that lists customer

name, product and quantity purchased typically has multiple rows for any customer who purchased

multiple products, with each row for that customer showing order details for a different product. By

pivoting the data set on the product column, the Pivot transformation can output a data set with a

single row per customer. That single row lists all the purchases by the customer, with the product names

shown as column names, and the quantity shown as a value in the product column. Because not every

customer purchases every product, many columns may contain null values.

The Advanced Editor for Pivot dialog box contains three tabs to configure the properties:


Description for the OLE DB Command Task. You can also specify a locale with the LocaleID


ValidateExternalMetadata property.


enables you to select the columns that the Pivot transformation operation will affect. When a

column is selected, it appears in the Input Columns list. The Output alias allows you to define

the name of the output column. The Usage Type property defines if the data imported is

READONLY data or READWRITE data.


properties of the input and the output columns. The most important property here is the

PivotUsage property. This determines what role the input column will play in creating the pivot

table and can be configured with the following values:

o 0. The column is not pivoted, and the values are passed through to the transformation

output.

o 1. The column is part of the set key that identifies one or more rows as part of one set.

o 2. The column is a pivot column. At least one column is created from each column value.

This data must be sorted input column.

o 3. The values from this column are placed in columns that are created because of the

pivot.

Unpivot transformation

The Unpivot transformation takes data from a denormalized or cross-tabulated result set and presents

the data in a normalized structure. The Unpivot transformation can be configured with the following

properties.



29

At the bottom of the Unpivot Transformation Editor dialog box is the Pivot key value column name.

Here, you define a column heading for the column that will hold the pivoted data that is converted into

normalized data such as Products or Fruits.

The Available Input Columns property enables you to select the input columns that the Unpivot

transformation operation turns into rows. When a column is selected, it appears in the Input Columns

list. Any columns that are not selected are passed through to the data flow output. The Destination

Column allows you to define the name of the destination column in the normalized output.

In the Unpivot scenario, multiple input columns are usually mapped to one destination column. For

example, the Available Input Columns may consist of column headings such as Apples, Pears and

Peaches. All of these input columns are mapped to a destination column named Fruits that may be

defined by the Pivot key value column name property.

The Pivot Key value property specifies the value that is used in the rows in the normalized result set and,

by default, uses the same name as the input column but can be changed.

Data Mining Query transformation

The Data Mining Query transformation enables you to run Data Mining Expression (DMX) statements

that use prediction statements against a mining model. Prediction queries enable you to use data mining

to make predictions about sales or inventory figures as an example. You can then create a data flow

output of the results. One transformation can execute multiple prediction queries if the models are built

on the same data mining structure.

Mining Model tab. The Mining Model tab is used to provide an existing Connection to the

Analysis Services database. You can specify a new connection by clicking the New button. The

Mining Structure allows you to specify the Data Mining Structure that is to be used as a basis for

analysis. A list of mining models is then presented. Query tab. The Query tab allows you to write the DMX prediction query. A Build New Query

button is provided to build the DMX prediction query through a builder.



30

Data Sampling Transformations

Data sampling transformations are useful when you want to extract sample data from the data flow or

you want to count the number of rows in the data flow. This can be useful in a number of different

scenarios. Ultimately, the objective is to create a small data output that can be used for testing or

development within the SSIS package.

Percentage Sampling transformation

The Percent Sampling transformation allows you to select a percentage of random rows from a data

flow input. This can be useful to generate a smaller set of data that is representative of the whole data

that can be used for development purposes. For example, in data mining, you can randomly divide a

data set into two data sets; one for training the data-mining model, and one for testing the model.

A random number determines the randomness. If you use the Random Seed property, you can specify a

number that the transformation will use. If you use the same number, it will always return the same

result set if the sampling is based on the same source data.

The Row Sampling transformation contains one screen that holds the properties to be configured.

You can specify the percentage number of rows to take from the data flow input by using the

Percentage of Rows property. You can also provide a name for the data flow outputs generated for both

the Sample Output Name and the Unselected Output Name. You can define your own random seed by

specifying a value in the Specify random seed value property.

Row Sampling transformation

The Row Sampling transformation allows you to select an exact number of random rows from a data

flow input. This can be useful to generate a smaller set of data that is representative of the whole datathat can be used for development purposes. For example, a company can randomly select 50 employees

to receive Christmas prizes for a calendar year against employee database to generate the exact number

of winners.

A random number determines the randomness. If you use the Random Seed property, you can specify a

number that the transformation uses. If you use the same number, it always returns the same result set

if the sampling is based on the same source data.

The Row Sampling transformation contains two pages that hold properties to be configured:

Sampling page. The Sampling page allows you to specify the exact number of rows to take fromthe data flow input by using the Number of Rows property. You can also provide a name for the

data flow outputs generated for both the Sample Output Name and the Unselected Output

Name. You can define your own random seed by specifying a value in the Specify random seed

value property.

Columns page. The Columns page consists of the Available Input Columns property that enables

you to select the columns that the Row Sampling transformation operation affects. When a


the name of the output column.



31

Row Count transformation

A Row Count transformation counts the rows that pass through the data flow and stores the result of

the count in a variable. This variable can then be used elsewhere in the SSIS package. The following

properties can be configured:


Description for the OLE DB Command task. You can also specify a locale with the LocaleID


ValidateExternalMetadata property. The most important property here is the Variable property.

You use this to map the result of the Row Count transformation to a user-defined variable.


enables you to select the columns that the Row Count operation affects. When a column is

selected, it appears in the Input Columns list. The Output alias allows you to define the name of

the output column. The Usage Type property defines if the data imported is READONLY data or

READWRITE data.





32

Audit Transformations

The Audit transformation allows you to create additional output columns within the data flow that holds

metadata about the SSIS package. This information can be used to provide metadata information that

maps to system variables, which exist within the SSIS package. The following information is available

within the Audit transformation and appears in a drop-down list in the AuditType property:

ExecutionInstanceGUID

PackageID

PackageName

VersionID

ExecutionStartTime

MachineName

UserName

TaskName

TaskId

The only other property to configure in the Audit transformation is the Output Column Name that allows

you to define a name for the columns that are used in the data flow output.



33

Fuzzy Transformations

Fuzzy transformations can be very useful for improving the data quality of existing data as well as new

data that is being loaded into your database.

Fuzzy Lookup

The Fuzzy Lookup transformation performs data cleansing tasks such as standardizing data, correcting

data and providing missing values.

Using the fuzziness capability that is available to the Fuzzy Grouping transformation, this logic can be

applied to Lookup operations so that it can return data from a dataset that may closely match the

Lookup value required. This is what separates the Fuzzy Lookup transformation from the Lookup

transformation, which requires an exact match. Note that the connection to SQL Server must resolve to

a user who has permission to create tables in the database.

The Fuzzy Lookup Transformation Editor dialog box consists of three tabs to configure:

Reference Table tab. The Reference Table tab allows you to define connection information that

is used within the data flow. This includes an OLE DB Connection Manager property for the

connection. The Reference table property allows you to select the reference table. You can also

choose whether to create new or use existing indexes with the Store New Index or Use Existing

Index Property.

Columns tab. The Columns tab consists of the Available Input Columns and Available Lookup

Columns property that enables you to select the columns that the Fuzzy Lookup transformation

operation affects. When a column is selected in the Available Lookup Columns, it appears in the

Lookup Columns list. The Output alias allows you to define the name of the output column.

Advanced tab. The Advanced tab sets the Similarity threshold property, which is a slider. The

closer the threshold is to one, the more the rows must resemble each other to qualify as

duplicates. You can also tokenize data using the Token delimiters property.

Fuzzy Grouping

The Fuzzy Group transformation allows you to standardize and cleanse data by selecting likely duplicate

data and comparing it to an alias row of data that is used to standardize the input data. As a result, a

connection is required to SQL Server, as the Fuzzy Group transformation requires a temporary table to

perform its work.

The Fuzzy Group transformation allows you to perform an exact match or a fuzzy match. An exact match

means that the data must exactly match for it to be part of the same group. A fuzzy match groups data

together that is approximately the same. You can determine the fuzziness by configuring numerousproperties to determine how dissimilar data values can be.

The Fuzzy Group Transformation Editor dialog box consists of three tabs:

Connection Managers tab. The Connection Managers tab allows the Fuzzy Group transformation

to create the temporary table required to perform the Fuzzy Group transformation. You use the



34

OLE DB Connection Manager property to point to an existing OLE DB connection or click on New

to create a new OLE DB connection.

Columns tab. The Columns tab consists of the Available Input Columns property that enables

you to select the columns that the Fuzzy Grouping transformation operation affects. When a


the name of the output column. The Group Output Alias allows you to define a group name for

the data that is grouped together. The Match Type property defines the type of fuzzy operation

that is conducted, which can be exact or fuzzy. You can determine the fuzziness by using the

Minimum Similarity property, a value close to one means that the data is nearly similar and you

use the Similarity Output Alias that generates a new output column that contains the similarity

scores for the selected join. You can specify how leading and trailing values are evaluated by

using the Numerals property and Comparison Flags can be used to ignore spaces or character

widths.

Advanced tab. The Advanced tab sets the Input key column name for the output column that

contains the unique identifier for each input row; Output key column name for the output

column that contains the unique identifier for the alias row of a group of duplicate rows; and

Similarity score column name for the name for the column that contains the similarity score. The

Similarity threshold property is a slider. The closer the threshold is to one; the more the rowsmust resemble each other to qualify as duplicates. You can also tokenize data by using the

Token delimiters property.



35

Term Transformations

You have the ability to extract nouns only, noun phrases only or both nouns and noun phases from

descriptive columns with the Term Extraction and Term Lookup transformation.

Term Extraction transformation

The Term Extraction transformation allows data flow inputs to be compared to a built-in dictionary to

extract nouns only, noun phrases only or both nouns and noun phases. However, a noun phrase can

include two words, one word, a noun and the other an adjective. It can also stem nouns to extract the

singular noun from a plural noun, so cars become car. This extraction formulates the basis of the data

flow output. This capability is only available with the English language.

The Term Extraction Transformation Editor dialog box contains three tabs to configure:

Term Extraction tab. The Term Extraction tab specifies a text column that contains text to be

extracted. The Available Input Columns property enables you to select the columns that the

Term Extraction transformation operation affects. You can define an output column name forthe Term that is extracted by using the Term property. The Score property allows you to define a

column name for the score that is assigned to the extracted term column.

Exclusion tab. The Exclusion tab allows you to point to a table that consists of a list of terms that

are excluded from the term extraction. This includes an OLE DB Connection Manager property

for the connection. The Table or View and Column property allows you to select the column

within the table that holds the exclusion terms.

Advanced tab. The Advanced tab allows you to set the term extraction type by using the Term

Type property set to nouns only, noun phrases only or both nouns and noun phases. The Score

type property sets the basis for scoring the terms by using frequency or Term Frequency Inverse

Document Frequency (TFIDF: a term scoring algorithm). You can specify case-sensitive

extractions and set Parameters for the Frequency Threshold, which specifies the frequency a

word must appear before it is extracted and the Maximum length of Term, which defines the

maximum number of characters in a word to perform the term extraction on.

Term Lookup transformation

The Term Lookup transformation can perform an extraction of terms from a reference table rather than

the built-in dictionary. It counts the number of times a term in the Lookup table occurs in the input data

set, and writes the count together with the term from the reference table to columns in the

transformation output.

Reference Table tab. The Reference Table tab allows you to define connection information that is used

within the data flow. This includes an OLE DB Connection Manager property for the connection. TheReference table property allows you to select the reference table.

Term Lookup tab. The Term Lookup tab consists of the Available Input Columns and Available Reference

Columns property that enables you to select the columns that the Term Lookup transformation

operation affects. When a column is selected in the Available Input Columns, it appears in the Pass

through Columns list. The Output Column alias allows you to define the name of the output data flow

column.



36

Advanced tab. The Advanced tab has the Use case-sensitive term lookup to add case sensitivity to the

Term Lookup transformation.



37

Best Practices

Use the correct data sources from the Data Flow Sources section in the Business Intelligence

Development Studio Toolbox that will extract data.

Use the correct data destinations from the Data Flow Destinations section in the BusinessIntelligence Development Studio Toolbox that will load the data.

Use OLE DB data sources to connect to SQL Server tables, the Access database and Excel 2007

spreadsheet.

Use the ADO.NET data source to connect to ODBC data sources and destinations.

Identify the transformation required to meet the data load requirements.

Use in-built transformations when possible.

Use the Script component Data Flow transformation to create custom data source, data

destinations or transformations.

Use Data Flow paths to control transformations within the Data Flow transformations.

Use the Slowly Changing Dimension transformation to manage changing data in dimension

tables in a data warehouse. Use the Lookup transformation to load a fact table in a data warehouse with the correct data.

Use the Cache transformation in conjunction with the Lookup transformation to improve the

performance of loading fact tables.



38

Lab: Implementing Data Flow in SQL Server Integration Services 2008

Lab Overview

Lab Introduction

The purpose of this lab is to focus on using data flows within an SSIS package to populate a simple datawarehouse. You will firstly edit an existing package to add data sources and destinations and use

common transformation to complete the loading of the StageProduct table. You will also implement a

data viewer in this package and run the package to ensure that data is being loaded correctly into the

ProductStage table. You will then create the dimension tables in the data warehouse focusing

specifically on the Slowly Changing Dimension task to manage changing data in the dimension tables.

You will finally explore how to populate the fact table within the data warehouse by using the Lookup

transformation to ensure that the correct data is being loaded into the fact table.

Lab Objectives

After completing this lab, you will be able to:

Define data sources and destinations.

Work with data flow paths.

Implement data flow transformations.



39

Scenario

You are a database professional for Adventure Works, a manufacturing company that sells bicycle and

bicycle components through the Internet and a reseller distribution network. You are continuing to work

on using SSIS to populate a simple data warehouse for testing purposes in a database named

AdventureWorksDWDev.

You want to complete the AWStaging package by configuring the Data Flow task that will load data into

the ProductStage table. You will implement simple transformations that you think you will use in the

production data warehouse. To verify that the transformations are working, you will add data viewers to

the data path to view the data before and after the transformation has occurred.

You will then edit the package named AWDataWarehouse. You will firstly edit a Data Flow task to

explore common transformations that are used within the data flow. However, you want to explore the

use of the Slowly Changing Dimension task to manage data changes when transferring data from the

ProductStage to the ProductDim table.

Finally, you will edit the LoadFact Data Flow task that will populate the FactSales table, which will use a

Lookup transformation to ensure that the correct data is loaded into the fact table.



40

Exercise Information

Exercise 1: Defining Data Sources and Destinations

In this exercise, you will complete the configuration of the AWStaging package by configuring the Data

Flow task that will populate the ProductStage table. You will define the data source as the

AdventureWorks2008 database. You will then use transformations to ensure that the data that is cleanlyloaded into the ProductStage table. You will then define the data destination as the ProductDim table in

the AdventureworksDWDev database.

Exercise 2: Working with Data Flow Paths

In this exercise, you will add an error Data Flow path from the AdventureWorksDWDev StageProduct

Data Flow task to a text file named StageProductLoadErrors.txt located in D:\Labfiles\Starter folder. You

will add a data viewer before and after the Category Uppercase Character Map transformation. You will

then run the package and review the data viewer before and after the Category Uppercase Character

Map transformation runs to view the differences in the data. After completing the review, you will

remove the data viewers.

Exercise 3: Implementing Data Flow Transformations

In this exercise, you will edit the package AWDataWarehouse. You will firstly edit the Generate Resellers

Data Data Flow task to explore common transformations that are used within the data flow. However,

you want to explore the use of the Slowly Changing Dimension task to manage changes of data when

transferring data from the ProductStage to the ProductDim table that is defined within the Generate

Product Data Data Flow task.

Finally, you will edit the Generate FactSales Data Data Flow task that will populate the FactSales table

that will use a Lookup transformation to ensure that the correct data is loaded into the fact table.



41

Lab Instructions: Implementing Data Flow in SQL Server Integration Services

2008

Exercise 1: Defining Data Sources and Destinations

Exercise Overview

In this exercise, you will complete the configuration of the AWStaging package by configuring thedata flow task that will populate the ProductStage table. You will define the data source as the

AdventureWorks2008 database. You will then use transformations to ensure that the data that isloaded into the ProductStage table is done so cleanly. You will then define the data destination asthe ProductDim table in the AdventureworksDWDev database.

Task 1: You are logged on to MIAMI with the username Student and password Pa$$w0rd.

Proceed to the next task

Log on to the MIAMI server.

a. To log on to the MIAMI server, press CTRL+ALT+DELETE.

b. On the Login screen, click the Student icon.

c. In the Password box, type Pa$$w0rd and then click the Forward button.

Task 2: Open Business Intelligence Development Studio and open the solution file AW_BI

solution located in D:\Labfiles\Starter\AW_BI folder

1. Open the Microsoft Business Intelligence Development Studio.

2. Open the AW_BI solution file in D:\Labfiles\Starter\AW_BI folder.

Task 3: Open the AWStaging package in the AW_SSIS project in the AW_BI solution

Open the AWStaging package in Business Intelligence Development Studio.

Task 4: Edit the Load Products Data Flow task and add an OLE DB Source to the data flow

designer that is configure to retrieve data from the Production.Product table in the

AdventureWorks2008 database

1. Open the Load Products Data Flow Designer in the AWStaging package in BusinessIntelligence Development Studio.

2. Add an OLE DB Source data flow source from the Toolbox onto the Data Flow Designer.Name the OLE DB Source data flow source AdventureWorks2008 Products.

3. Edit the AdventureWorks2008 Products OLE DB data source by retrieving the ProductID,

Name, SubCategory name, Category name, ListPrice, Color, Size, Weight,DaystoManufacture, SellStartDate and SellEndDate form the Production.Product,Production.ProductSubcategory and Production.ProductCategory table in theAdventureWorks2008 database. Add a WHERE clause that will return all products greaterthan the date stored in the ProductLastExtract variable.

4. Save the AW_BI solution.

Task 5: Add a Character Map transformation to the Load Products Data Flow Designer that

is configured to transform the data in the Category column to uppercase. Name the

transformation Category Uppercase and set the Data Flow path from the

AdventureWorks2008 Products Data Flow task to the Category Uppercase transformation

1. Add a Character Map transformation from the Toolbox onto the Data Flow Designer. Namethe Character Map transformation Category Uppercase.

2. Set the Data Flow path from the AdventureWorks2008 Products Data Flow task to theCategory Uppercase transformation.



42

Task 6: Edit the Category Uppercase Character Map transformation to change the

character set of the Category column to uppercase

1. Edit the Category Uppercase Character Map transformation to change the character set of the Category column to uppercase.


Task 7: Edit the Load Products Data Flow task and add an OLE DB Destination to the Data

Flow Designer named AdventureWorksDWDev StageProduct. Then set the Data Flow path

from the Category Uppercase transformation to the AdventureWorksDWDev StageProduct

OLE DB Destination

1. Add an OLE DB Source data flow destination from the Toolbox onto the Data Flow Designer.Name the OLE DB Source data flow source AdventureWorksDWDev StageProduct.

2. Set the Data Flow path from the Category Uppercase transformation to theAdventureWorksDWDev StageProduct OLE DB Destination.

Task 8: Edit the AdventureWorksDWDev StageProduct OLE DB Destination to load the

data into the StageProduct table and remove the Check constraints option

1. Edit the AdventureWorksDWDev StageProduct OLE DB Destination to load the data into the

StageProduct table in the AdventureWorksDWDev database.

2. Edit the AdventureWorksDWDev StageProduct OLE DB Destination by performing columnmapping between the source and destination data.

3. Save and close the AW_BI solution.

Task 9: You have completed all tasks in this exercise

A successful completion of this exercise results in the following outcomes:

a. You have created an OLE DB Source data flow source.

b. You have created a Transact-SQL statement to query the source data.

c. You have created a simple character map transformation.

d. You have created an OLE DB Destination data flow destination.



43

Exercise 2: Working with Data Flow Paths

Exercise Overview

In this exercise, you will add an error Data Flow path from the AdventureWorksDWDevStageProduct Data Flow task to a text file named StageProductLoadErrors.txt located inD:\Labfiles\Starter folder. You will then add a data viewer before and after the Category Uppercase

Character Map transformation. You will then run the package and then review the data viewer priorto the Category Uppercase Character Map transformation running and after the Category UppercaseCharacter Map transformation to view the differences in the data. Once completing the review, youwill then remove the data viewers.





Task 2: Open the AWStaging package in the AW_SSIS project in the AW_BI solution


Task 3: Edit the Load Products Data Flow task and add a Flat File Destination to the DataFlow Designer that is configure to a text file located in the D:\Labfiles\Starter folder

named StageProductLoadErrors.txt

1. Open the Load Products Data Flow Designer in the AWStaging package in BusinessIntelligence Development Studio.

2. Add a Flat File Destination data flow destination from the Toolbox onto the Data FlowDesigner. Name the Flat File Destination data flow destination StageProduct Load Errors.

Task 4: Create an Error Data Flow path from the AdventureWorksDWDev StageProduct

OLE DB Destination StageProduct Load Errors Flat File Destination

Set the Data Flow path from the AdventureWorksDWDev StageProduct OLE DB Destinationto the StageProduct Load Errors Flat File Destination.

Task 5: Edit the StageProduct Load Errors Flat File Destination creating a connection to

the StageProductLoadErrors.txt located in D:\Labfiles\Starter folder. Name the

connection StageProduct Errors

1. Configure the StageProduct Load Errors Flat File Destination to create a text file namedStageProductLoadErrors.txt located in D:\Labfiles\Starter. Name the connectionStageProduct Errors.

2. Review the column mappings between the AdventureWorksDWDev StageProduct OLE DBDestination and the StageProduct Load Errors Flat File Destination.

Task 6: Edit the AdventureWorksDWDev StageProduct OLE DB Destination to redirect

rows when an error is encountered

Configure AdventureWorksDWDev StageProduct OLE DB Destination to redirect rows when

errors are encountered in the data flow.

Task 7: Add a Grid Data Viewer in the Data Flow path between the AdventureWorks2008

Products OLE DB Source and the Category Uppercase Character Map transformation

Add a Grid Data Viewer in the Data Flow path between the AdventureWorks2008 ProductsOLE DB Source and the Category Uppercase Character Map transformation.



44

Task 8: Add a Grid Data Viewer in the Data Flow path between the Category Uppercase

Character Map transformation and the AdventureworksDWDev StageProduct OLE DB

Destination. Then save the AW_BI solution

1. Add a Grid Data Viewer in the Data Flow path between the Category Uppercase CharacterMap transformation and the AdventureworksDWDev StageProduct OLE DB Destination.


Task 9: Execute the Load Products Data Flow task to view the data viewers to confirm

that the transform has worked correctly. Observe the data load into the StageProduct

table of the AdventureWorksDWDev database and for any records that have failed verify

that the data has loaded into the StageProductLoadErrors.txt file located in the

D:\Labfiles\Starter folder

1. Execute the Load Products Data Flow task and view the data viewers that execute.

2. View the AdventureWorksDWDev StageProduct OLE DB Destination and confirm that 295rows are inserted into the StageProduct table.

3. View the data in the StageProduct table in the AdventureWorksDWDev database by usingSQL Server Management Studio.

4. Confirm that the StageProductLoadErrors.txt file located in D:\Labfiles\Starter foldercontains 50 records.

Task 10: Clean out the data from the StageProduct table and the

StageProductLoadErrors.txt file. Remove the Data viewers and correct the error that is

Occurring with the LoadProducts Data Flow task

1. In Notepad, delete the data within the StageProductLoadErrors.txt text file.

2. Remove the data from the StageProduct table in the AdventureWorksDWDev database.

In the Query Window, type in the following code.

USE AdventureWorksDWDev

GO

DELETE FROM StageProduct

GOSELECT * FROM StageProduct

3. Stop debugging in Business Intelligence Development Studio and remove the data viewersfrom the Load Products Data Flow task.

4. Edit the AdventureWorks2008 Products OLE DB data source by changing the query toreplace Null values returned in the Color column with the value of None.

Task 11: Clean out the data from the StageProduct table and the

StageProductLoadErrors.txt file. Remove the Data viewers and correct the error that is

Occurring with the LoadProducts Data Flow task

1. Execute the Load Products Data Flow task.

2. Confirm that the StageProductLoadErrors.txt file located in D:\Labfiles\Starter foldercontains 0 records.

3. View the data in the StageProduct table in the AdventureWorksDWDev database by usingSQL Server Management Studio.



45

4. Remove the data from the StageProduct table in the AdventureWorksDWDev database.

In the Query Window, type in the following code.

USE AdventureWorksDWDev

GO

DELETE FROM StageProduct

GO

SELECT * FROM StageProduct

Task 12: Save and close the AW_BI solution in Business Intelligence Development Studio

Save and close the AW_BI solution.



a. You have created and configured an error data path.

b. You have added data viewers to the Data Flow path.

c. You have observed the effects of Data Flow paths.

d. You have corrected errors in a data flow and observed the successful completion of

a Data Flow path.



46

Exercise 3: Implementing Data Flow Transformations

Exercise Overview

In this exercise, you will edit the package AWDataWarehouse. You will firstly edit the GenerateResellers Data Data Flow task to explore common transformations that are used within the dataflow. However, you want to explore the use of the Slowly Changing Dimension task to manage

changes of data when transferring data from the ProductStage to the ProductDim table that isdefined within the Generate Product Data Data Flow task. Finally, you will edit the GenerateFactSales Data Data Flow task that will populate the FactSales table that will use a lookuptransformation to ensure that the correct data is loaded into the fact table.





Task 2: Open the AWDataWarehouse package in the AW_SSIS project in the AW_BI

solution


Task 3: Edit the Generate Resellers Data Data Flow task in the AWDataWarehouse

package and add a OLE DB Source to the Data Flow Designer that is configure to retrieve

data from the dbo.StageReseller table in the AdventureWorksDWDev database

1. Open the Generate Resellers Data Flow task in the AWDataWarehouse package in BusinessIntelligence Development Studio.

2. Add an OLE DB Source data flow source from the Toolbox onto the Data Flow Designer.Name the OLE DB Source data flow source AdventureWorksDWDev StageResellers.

3. Edit the AdventureWorksDWDev StageResellers OLE DB data source by retrieving allcolumns from the StageReseller table in the AdventureWorksDWDev database.


Task 4: Add a Conditional Split transformation that will keep all of the Resellers with anAddressType of Main Office within the dimension table data load and output other

address types to a text file named NonMainOffice.txt in the D:\Labfiles\Starter folder.

Name the Conditional Split transformation MainOffice

1. Add a Conditional Split transformation from the Toolbox onto the Data Flow Designer. Namethe Conditional Split transformation Main Office data.

2. Configure the MainOffice Conditional Split transformation to identify records that have anAddressType of Main Office and those records that do not.

3. Create the Flat File Destination and name the Flat File Destination NonMainOffices.

4. Set the Data Flow path from the MainOffice Conditional Split transformation to theNonMainOffices Flat File Destination.

Task 5: Add a Sort transformation named CountryRegionSort below the MainOfficeConditional Split transformation and drag a Data Flow path from the MainOffice

Conditional Split transformation to the CountryRegionSort Sort transformation

1. Add a Sort transformation from the Toolbox onto the Data Flow Designer. Name the Sorttransformation Main Office data.

2. Set the Data Flow path from the MainOffice Conditional Split transformation to theNonMainOffices Flat File Destination.

3. Configure the CountryRegionSort Sort transformation to sort by CountryRegionName.



47

Task 6: Edit the Generate Reseller Data Data Flow task and add an OLE DB Destination to

the Data Flow Designer named AdventureWorksDWDev DimReseller. Then set the Data

Flow path from the CountryRegionSort transformation to the AdventureWorksDWDev

DimReseller OLE DB Destination

1. Add an OLE DB Source data flow destination from the Toolbox onto the Data Flow Designer.

Name the OLE DB Source data flow source AdventureWorksDWDev DimReseller.2. Set the Data Flow path from the Category Uppercase transformation to the

AdventureWorksDWDev StageProduct OLE DB Destination.

Task 7: Edit the AdventureWorksDWDev DimReseller OLE DB Destination to load the data

into the DimReseller table and remove the Check constraints option

1. Edit the AdventureWorksDWDev StageProduct OLE DB Destination to load the data into theStageProduct table in the AdventureWorksDWDev database.

2. Edit the AdventureWorksDWDev DimReseller OLE DB Destination by performing columnmapping between the source and destination data.


Task 8: Edit the Generate Product Data Data Flow task in the AWDataWarehouse package

and add an OLE DB Source to the Data Flow Designer that is configured to retrieve datafrom the dbo.StageProduct table in the AdventureWorksDWDev database

1. Open the Generate Product Data Data Flow task in the AWDataWarehouse package inBusiness Intelligence Development Studio.

2. Add an OLE DB Source data flow source from the Toolbox onto the Data Flow Designer.Name the OLE DB Source data flow source AdventureWorksDWDev StageProducts.

3. Edit the AdventureWorksDWDev StageProducts OLE DB data source by retrieving allcolumns from the StageProduct table in the AdventureWorksDWDev database.


Task 9: Edit the Generate Product Data Data Flow task in the AWDataWarehouse package

and add a Slowly Changing Dimension task that loads data into the DimProduct table and

treats the Category and Subcategory data as changing attributes and theEnglishProductName as a historical attribute. All remaining columns will be treated as a

fixed attribute

1. Open the Generate Product Data Data Flow task in the AWDataWarehouse package inBusiness Intelligence Development Studio.

2. Add a Slowly Changing Dimension Data Flow task to the Data Flow Designer and thencreate a Data Flow path from the AdventureWorksDWDev StageProducts OLE DB datasource to the Slowly Changing Dimension.

3. Run a Slowly Changing Dimension wizard selecting DimProduct as the destination table andProductAlternateKey column as the business key.

4. In the Slowly Changing Dimension wizard, treat the Category and Subcategory data aschanging attributes and the EnglishProductName as a historical attribute. All remaining

columns will be treated as a fixed attribute.5. In the Slowly Changing Dimension wizard, set the wizard to fail transformations with

changes to fixed attributes and use start and end dates to identify current and expiredrecords based on the System::StartTime variable. Disable the inferred members support.


Task 10: Review the FactSales table in the AdventureWorksDWDev database removing

the ExtendedAmount, UnitPriceDiscountPct, TotalProductCost and TaxAmount columns.



48

Then, edit the Generate FactSales Data Data Flow task to load the FactSales table with

the correct data

1. Open SQL Server Management Studio and view the columns in the FactSales table of theAdventureWorksDWDev database.

2. Maximize Business Intelligence Development Studio and add an OLE DB data source to theAdventureWorks2008 database within the Generate FactSales Data Data Flow task thatuses the SourceFactLoad.sql file located in D:\Labfiles\Starter.

3. Use a Data Conversion transformation to convert the following columns that will be loadedinto the FactSales table in the AdventureWorksDWDev database:

ProductID integer data type to a Unicode string (25) with an output nameProductIDMapping.

BusinessEntityID integer data type to a Unicode string (25) with an output nameResellerIDMapping.

Convert the SalesOrderNumber to a Unicode string (20) with an output name of StringSalesOrderNumber.

Covert the SalesOrderLineNumber to a single byte unsigned integer with an outputname of TinyIntSalesOrderLineNumber.

Convert the UnitPriceDiscount column to a double-precision float data type with anoutput name of cnv_UnitPriceDiscount.

Convert the LineTotal column to a currency data type with an output name of cnv_LineTotal.

4. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up theProduct Dimension Key based on the ProductAltenateKey.

5. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up theReseller Dimension Key based on the BusinessEntityID.

6. Add a Raw File destination within the Generate FactSales Data Data Flow task that will beused as the error output for the ResellerKey Lookup task.

7. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up theTime Key based on the Orderdate column.

8.

Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up theTime Key based on the DueDate column.

9. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up theTime Key based on the ShipDate column.

10. Add an OLE DB Destination to the data flow and map the input columns correctly to thecolumns in the SalesFact table of the AdventureWorksDWDev database.

Map the following Available Input Columns to the Available Destination Columns:

Available Input Columns Available Destination Column

ProductKey ProductKey

OrderDate Lookup.TimeKey OrderDateKey

DueDate Lookup.TimeKey DueDateKey

ShipDate Lookup.TimeKey ShipDateKey

ResellerKey ResellerKey

StringSalesOrderNumber SalesOrderNumber

TinyIntSalesOrderLineNumber SalesOrderLineNumber



49

RevisionNumber RevisionNumber

OrderQty OrderQuantity

UnitPrice UnitPrice

Cnv_UnitPriceDiscount DiscountAmount

StandardCost ProductStandardCost

Cnv_LineTotal SalesAmount


Task 11: Execute the LoadAWDW package that contains the Execute Package tasks that

controls the load of the AdventureWorksDWDev data warehouse and review the data in

the database by using SQL Server Management Studio

1. Open Business Intelligence Development Studio, execute the LoadAWDWDev package.

2. Save and close the AW_BI solution.



You have opened Business Intelligence Development Studio and opened a data flowcomponent within a package.

You have added an OLE DB data source within a data flow.

You have added a conditional split transformation to a data flow task.

You have added a sort transformation to a data flow task.

You have added and edited an OLE DB data destination within a data flow.

You have added and edited a slowly changing dimension transformation.

You have added and edited a lookup transformation to load a fact table with datawithin a data warehouse.

You have added and edited execute package task to control the load of data into adata warehouse.



50

Lab Review

In this lab, you used data flows within an SSIS package to populate a simple data warehouse. You firstly

edited an existing package to add data sources and destinations and use common transformation to

complete the loading of the ProductStage table. Then, you implemented a data viewer in this package

and ran the package to ensure that data was loaded correctly into the ProductStage table.

You then created the dimension tables in the data warehouse focusing specifically on the Slowly

Changing Dimension task to manage changing data in the dimension tables. You finally explored to

populate the fact table within the data warehouse by using the Lookup transformation to ensure that

the correct data was loaded into the fact table.

What is the purpose of Data Flow paths?

Data Flow paths are used to control the flow of data within the Data Flow task. You can define a success

Data Flow path represented by a green arrow, which will move the Data Flow path onto the next data

flow component. You can also use an error output Data Flow path to control the flow of data when an

error occurs.

What kind of errors can be managed by the error output Data Flow path?

You can define errors or truncation errors to be managed by the error output Data Flow path.

What data types does the Export Column transformation manage?

The DT_IMAGE, DT_TEXT and DT_NTEXT data types. The Export Column transformation moves this type

of data stored within a table to a file.

What is the difference between a Type 1 and a Type 2 Slowly Changing Dimension and how

are they represented in the Slowly Changing Dimension transformation?

Type 1 is a Slowly Changing Dimension that will overwrite data values within a dimension table. As a

result, no historical data is retained. In the Slowly Changing Dimension Wizard, this is referred to as a

Changing Attribute.

Type 2 Slowly Changing Dimension will insert a new record when the value in a dimension table

changes. As a result, historical data is retained. This is referred to as a Historical Attribute in the Slowly

Changing Dimension Wizard.

What is the difference between a Lookup and a Fuzzy Lookup transformation?

The Lookup transformation enables you to take information from an input column and then look up

additional information from another dataset that is linked to the input columns through a common

column.

Fuzzy Lookup transformation uses logic that can be applied to Lookup operations so that it can return

data from a dataset that may closely match the Lookup value required.



51

Module Summary

Defining Data Sources and Destinations

In this lesson, you have learned the following key points:

The ETL operation uses data sources to retrieve the source data, transformations to change the

data and data destinations to load the data into a destination database.

The range of data flow source that enables SSIS to connect to a wide range of data sources

include:

o OLEDB to connect to SQL Server, Microsoft Access 2007 and Microsoft Excel 2007

o Flat file to connect to text and csv files

o Raw file to connect to raw file sources created by raw file destinations

o Microsoft Excel to connect to Microsoft Office Excel 97 –2002

o XML to connect to XML data sources

o ADO.Net sources to connect to a database to create a datareader

The data flow destinations that are available in SSIS include:o OLEDB to connect to SQL Server, Microsoft Access 2007 and Microsoft Excel 2007

o Flat file to connect to text and csv files

o Raw file to connect to raw file sources created by raw file destinations

o Microsoft Excel to connect to Microsoft Office Excel 97 – 2002

o XML to connect to XML data sources

o ADO.Net sources to connect to a database to create a datareader

You can configure an OLE DB Data Source to retrieve data from SQL Server 2008 objects defining

a server name, authentication method and database name.

You can configure data sources for Access by using the OLEDB data source.

You can configure data sources for specific versions of Excel by using OLEDB and Microsoft Excel

data sources and destinations.

Data Flow Paths


Data flow paths can be used to control the flow of data flows and transformations in an SSIS

package using success data flow paths and error data flow paths.

You can create data flow paths and use them to create inputs into other data flow components.

In addition, you can use data flow paths to create error data flow outputs by clicking and

dragging the data flow path between different data flow components.

Data viewers help you to view the data before and after transformations take place to verify

that, the transformations are working as expected.

The types of data viewers available to check the data within the data flow include:

o Grid that returns the data in rows and columns in a table

o Histogram works with numeric data only, allowing you to select one column from the

data flow



52

o Scatter Plot works with two numeric columns from a data source, providing the X-axis

and Y-axis of a chart

o Column Chart allows you to select one column from the data flow that presents a

column chart that shows the number of occurrences

You can create data viewers with SSIS to view the data flow as the package executes.



Transformations in SSIS allow you to change the data as the data is being moved from a source

connection to a destination connection. They can also be used to standardize and cleanse the

data.

You can modify data by using the data formatting transformations, including:

o Character Map transformation for simple data transforms such as uppercase or

lowercase

o Data conversion transformation to convert data in the data flow

o Sort transformation to sort the data ascending or descending within the data flow

o Aggregate transformation that enables you to create a scalar results set or use in

conjunction with a Group By clause to return multiple results

You can manipulate column data by using column transformations, including:

o Copy transformation to copy data between a source and a destination

o Derived Column transformation to create a new column of data

o Import column transformation

o Export column transformation

You can manage the data flow by using Multiple Data Flow transformations, including:

o Conditional Split transformation to separate data based on an expression that acts as acondition for the split

o Multicast transformation that enables you to generate multiple copies of the same data

o Merge transformation that enables you to merge sorted data

o Merge Join transformation that enables you to merge sorted data based on a join

condition

o Union All transformation that enables you to merge unsorted data

You can create custom data sources, destinations and data transformations by using Custom

transformations, including:

o Script transformation that allows you to create custom data sources, destinations and

data transformations using Visual Basic or C#

o

OLE DB command transformation to issue OLE DB commands You can implement simple transformations in the Data Flow of SSIS.

You can use the Slowly Changing Dimension transformation to manage changing data within a

dimension table in a data warehouse.



53



You can create Lookup and Cache transformations in SQL Server 2008. The Lookup

transformation helps you to take information from an input column and then look up additionalinformation from another dataset that is linked to the input columns through a common column

managing data in a data warehouse. The Cache transformation is used to improve the

performance of a Lookup transformation.

You can analyze data within the data flow by using Data Analysis transformations, including:

o Pivot transformation to create a crosstab result set

o Unpivot transformation to create a normalized result set

o Data Mining Query transformation to use data mining extension to perform data

analysis

You can create a sample of data using Data Sampling transformations, including:

o Percentage Sampling transformation to generate a sample of data based on a

percentage value

o Row Sampling transformation to generate a sample of data based on a set value

o Row Count transformation enables you to perform a row count of data and pass the

value to a variable

Audit Transformation is used to add metadata information to the data flow.

Fuzzy transformations can be used to help standardize data, including:

o Fuzzy Lookup to perform lookups of data against data that mat not exactly match

o Fuzzy Grouping to group data together that are candidates for the same type of data

You can use Term transformations to extract nouns and noun phrases from within the data flow,

including:

o Term Extraction transformation

o Term Lookup transformation

Lab: Implementing Data Flow in SQL Server Integration Services 2008

In this lab, you used data flows within an SSIS package to populate a simple data warehouse. You firstly

edited an existing package to add data sources and destinations and use common transformation to

complete the loading of the ProductStage table. Then, you implemented a data viewer in this package

and ran the package to ensure that data was loaded correctly into the ProductStage table.

You then created the dimension tables in the data warehouse focusing specifically on the Slowly

Changing Dimension task to manage changing data in the dimension tables. You finally explored theways to populate the fact table within the data warehouse by using the Lookup transformation to

ensure that the correct data was loaded into the fact table.



54

Glossary

.NET Framework

An integral Windows component that supports building, deploying and running the next generation of applications and Web services. It provides a highly productive, standards-based, multilanguage

environment for integrating existing investments with next generation applications and services, as well

as the agility to solve the challenges of deployment and operation of Internet-scale applications. The

.NET Framework consists of three main parts: the common language runtime, a hierarchical set of

unified class libraries and a componentized version of ASP called ASP.NET.

ad hoc report

An .rdl report created with report builder that accesses report models.

aggregation

A table or structure that contains precalculated data for a cube.

aggregation design

In Analysis Services, the process of defining how an aggregation is created.

aggregation prefix

A string that is combined with a system-defined ID to create a unique name for a partition's aggregation

table.

ancestor

A member in a superior level in a dimension hierarchy that is related through lineage to the current

member within the dimension hierarchy.

attribute

The building block of dimensions and their hierarchies that corresponds to a single column in a

dimension table.

attribute relationship

The hierarchy associated with an attribute containing a single level based on the corresponding column

in a dimension table.



55

axis

A set of tuples. Each tuple is a vector of members. A set of axes defines the coordinates of a

multidimensional data set.

ActiveX Data Objects

Component Object Model objects that provide access to data sources. This API provides a layer between

OLE DB and programming languages such as Visual Basic, Visual Basic for Applications, Active Server

Pages and Microsoft Internet Explorer Visual Basic Scripting.

ActiveX Data Objects (Multidimensional)

A high-level, language-independent set of object-based data access interfaces optimized for

multidimensional data applications.

ActiveX Data Objects MultiDimensional.NET

A managed data provider used to communicate with multidimensional data sources.

ADO MD

See Other Term: ActiveX Data Objects (Multidimensional)

ADOMD.NET

See Other Term: ActiveX Data Objects MultiDimensional.NET

AMO

See Other Term: Analysis Management Objects

Analysis Management Objects

The complete library of programmatically accessed objects that let and application manage a running

instance of Analysis Services.

balanced hierarchy

A dimension hierarchy in which all leaf nodes are the same distance from the root node.

calculated column

A column in a table that displays the result of an expression instead of stored data.

calculated field

A field, defined in a query, that displays the result of an expression instead of stored data.



56

calculated member

A member of a dimension whose value is calculated at run time by using an expression.

calculation condition

A MDX logical expression that is used to determine whether a calculation formula will be applied against

a cell in a calculation subcube.

calculation formula

A MDX expression used to supply a value for cells in a calculation subcube, subject to the application of

a calculation condition.

calculation pass

A stage of calculation in a multidimensional cube in which applicable calculations are evaluated.

calculation subcube

The set of multidimensional cube cells that is used to create a calculated cells definition. The set of cells

is defined by a combination of MDX set expressions.

case

In data mining, a case is an abstract view of data characterized by attributes and relations to other

cases.

case key

In data mining, the element of a case by which the case is referenced within a case set.

case set

In data mining, a set of cases.

cell

In a cube, the set of properties, including a value, specified by the intersection when one member is

selected from each dimension.

cellset

In ADO MD, an object that contains a collection of cells selected from cubes or other cellsets by a

multidimensional query.



57

changing dimension

A dimension that has a flexible member structure, and is designed to support frequent changes to

structure and data.

chart data region

A report item on a report layout that displays data in a graphical format.

child

A member in the next lower level in a hierarchy that is directly related to the current member.

clickthrough report

A report that displays related report model data when you click data within a rendered report builder

report.

clustering

A data mining technique that analyzes data to group records together according to their location within

the multidimensional attribute space.

collation

A set of rules that determines how data is compared, ordered and presented.

column-level collation

Supporting multiple collations in a single instance.

composite key

A key composed of two or more columns.

concatenation

The combining of two or more character strings or expressions into a single character string or

expression, or to combine two or more binary strings or expressions into a single binary string or

expression.

concurrency

A process that allows multiple users to access and change shared data at the same time. SQL Server uses

locking to allow multiple users to access and change shared data at the same time without conflicting

with each other.



58

conditional split

A restore of a full database backup, the most recent differential database backup (if any), and the log

backups (if any) taken since the full database backup.

config file

See Other Term: configuration file

configuration

In reference to a single microcomputer, the sum of a system's internal and external components,

including memory, disk drives, keyboard, video and generally less critical add-on hardware, such as a

mouse, modem or printer.

configuration file

A file that contains machine-readable operating specifications for a piece of hardware or software, orthat contains information about another file or about a specific user.

configurations

In Integration Services, a name or value pair that updates the value of package objects when the

package is loaded.

connection

An interprocess communication (IPC) linkage established between a SQL Server application and an

instance of SQL Server.

connection manager

In Integration Services, a logical representation of a run-time connection to a data source.

constant

A group of symbols that represent a specific data value.

container

A control flow element that provides package structure.

control flow

The ordered workflow in an Integration Services package that performs tasks.



59

control-break report

A report that summarizes data in user-defined groups or breaks. A new group is triggered when

different data is encountered.

cube

A set of data that is organized and summarized into a multidimensional structure defined by a set of

dimensions and measures.

cube role

A collection of users and groups with the same access to a cube.

custom rollup

An aggregation calculation that is customized for a dimension level or member, and that overrides the

aggregate functions of a cube's measures.

custom rule

In a role, a specification that limits the dimension members or cube cells that users in the role are

permitted to access.

custom variable

An aggregation calculation that is customized for a dimension level or member and overrides the

aggregate functions of a cube's measures.

data dictionary

A set of system tables, stored in a catalog, that includes definitions of database structures and related

information, such as permissions.

data explosion

The exponential growth in size of a multidimensional structure, such as a cube, due to the storage of

aggregated data.

data flow

The ordered workflow in an Integration Services package that extracts, transforms and loads data.

data flow engine

An engine that executes the data flow in a package.



60

data flow task

Encapsulates the data flow engine that moves data between sources and destinations, providing the

facility to transform, clean and modify data as it is moved.

data integrity

A state in which all the data values stored in the database are correct.

data manipulation language

The subset of SQL statements that is used to retrieve and manipulate data.

data mart

A subset of the contents of a data warehouse.

data member

A child member associated with a parent member in a parent-child hierarchy.

data mining

The process of analysing data to identify patterns or relationships.

data processing extension

A component in Reporting Services that is used to retrieve report data from an external data source.

data region

A report item that displays repeated rows of data from an underlying dataset in a table, matrix, list or

chart.

data scrubbing

Part of the process of building a data warehouse out of data coming from multiple (OLTP) systems.

data source

In ADO and OLE DB, the location of a source of data exposed by an OLE DB provider.

The source of data for an object such as a cube or dimension. It is also the specification of the

information necessary to access source data. It sometimes refers to object of ClassType clsDataSource.

In Reporting Services, a specified data source type, connection string and credentials, which can be

saved separately to a report server and shared among report projects or embedded in a .rdl file.



61

data source name

The name assigned to an ODBC data source.

data source view

A named selection of database objects that defines the schema referenced by OLAP and data mining

objects in an Analysis Services databases.

data warehouse

A database specifically structured for query and analysis.

database role

A collection of users and groups with the same access to an Analysis Services database.

data-driven subscription

A subscription in Reporting Services that uses a query to retrieve subscription data from an external

data source at run time.

datareader

A stream of data that is returned by an ADO.NET query.

dataset

In OLE DB for OLAP, the set of multidimensional data that is the result of running a MDX SELECT

statement.

In Reporting Services, a named specification that includes a data source definition, a query definition

and options.

decision support

Systems designed to support the complex analytic analysis required to discover business trends.

decision tree

A treelike model of data produced by certain data mining methods.

default member

The dimension member used in a query when no member is specified for the dimension.



62

delimited identifier

An object in a database that requires the use of special characters (delimiters) because the object name

does not comply with the formatting rules of regular identifiers.

delivery channel type

The protocol for a delivery channel, such as Simple Mail Transfer Protocol (SMTP) or File.

delivery extension

A component in Reporting Services that is used to distribute a report to specific devices or target

locations.

density

In an index, the frequency of duplicate values.

In a data file, a percentage that indicates how full a data page is.

In Analysis Services, the percentage of cells that contain data in a multidimensional structure.

dependencies

Objects that depend on other objects in the same database.

derived column

A transformation that creates new column values by applying expressions to transformation input

columns.

descendant

A member in a dimension hierarchy that is related to a member of a higher level within the same

dimension.

destination

An Integration Services data flow component that writes the data from the data flow into a data source

or creates an in-memory dataset.

destination adapter

A data flow component that loads data into a data store.

dimension

A structural attribute of a cube, which is an organized hierarchy of categories (levels) that describe data

in the fact table.



63

dimension granularity

The lowest level available to a particular dimension in relation to a particular measure group.

dimension table

A table in a data warehouse whose entries describe data in a fact table. Dimension tables contain the

data from which dimensions are created.

discretized column

A column that represents finite, counted data.

document map

A navigation pane in a report arranged in a hierarchy of links to report sections and groups.

drillthrough

In Analysis Services, a technique to retrieve the detailed data from which the data in a cube cell was

summarized.

In Reporting Services, a way to open related reports by clicking hyperlinks in the main drillthrough

report.

drillthrough report

A report with the 'enable drilldown' option selected. Drillthrough reports contain hyperlinks to related

reports.

dynamic connection string

In Reporting Services, an expression that you build into the report, allowing the user to select which

data source to use at run time. You must build the expression and data source selection list into the

report when you create it.

Data Mining Model Training

The process a data mining model uses to estimate model parameters by evaluating a set of known and

predictable data.

entity

In Reporting Services, an entity is a logical collection of model items, including source fields, roles,

folders and expressions, presented in familiar business terms.

executable

In Integration Services, a package, Foreach Loop, For Loop, Sequence or task.



64

execution tree

The path of data in the data flow of a SQL Server 2008 Integration Services package from sources

through transformations to destinations.

expression

In SQL, a combination of symbols and operators that evaluate to a single data value.

In Integration Services, a combination of literals, constants, functions and operators that evaluate to a

single data value.

ETL

Extraction, transformation and loading. The complex process of copying and cleaning data from

heterogeneous sources.

fact

A row in a fact table in a data warehouse. A fact contains values that define a data event such as a sales

transaction.

fact dimension

A relationship between a dimension and a measure group in which the dimension main table is the

same as the measure group table.

fact table

A central table in a data warehouse schema that contains numerical measures and keys relating facts to

dimension tables.

field length

In bulk copy, the maximum number of characters needed to represent a data item in a bulk copy

character format data file.

field terminator

In bulk copy, one or more characters marking the end of a field or row, separating one field or row in the

data file from the next.

filter expression

An expression used for filtering data in the Filter operator.



65

flat file

A file consisting of records of a single record type, in which there is no embedded structure information

governing relationships between records.

flattened rowset

A multidimensional data set presented as a two-dimensional rowset in which unique combinations of

elements of multiple dimensions are combined on an axis.

folder hierarchy

A bounded namespace that uniquely identifies all reports, folders, shared data source items and

resources that are stored in and managed by a report server.

format file

A file containing meta information (such as data type and column size) that is used to interpret datawhen being read from or written to a data file.

File connection manager

In Integration Services, a logical representation of a connection that enables a package to reference an

existing file or folder or to create a file or folder at run time.

For Loop container

In Integration Services, a container that runs a control flow repeatedly by testing a condition.

Foreach Loop container

In Integration Services, a container that runs a control flow repeatedly by using an enumerator.

Fuzzy Grouping

In Integration Services, a data cleaning methodology that examines values in a dataset and identifies

groups of related data rows and the one data row that is the canonical representation of the group.

global assembly cache

A machine-wide code cache that stores assemblies specifically installed to be shared by many

applications on the computer.

grant

To apply permissions to a user account, which allows the account to perform an activity or work with

data.



66

granularity

The degree of specificity of information that is contained in a data element.

granularity attribute

The single attribute is used to specify the level of granularity for a given dimension in relation to a given

measure group.

grid

A view type that displays data in a table.

grouping

A set of data that is grouped together in a report.

hierarchy

A logical tree structure that organizes the members of a dimension such that each member has one

parent member and zero or more child members.

hybrid OLAP

A storage mode that uses a combination of multidimensional data structures and relational database

tables to store multidimensional data.

HTML Viewer

A UI component consisting of a report toolbar and other navigation elements used to work with areport.

input member

A member whose value is loaded directly from the data source instead of being calculated from other

data.

input set

The set of data provided to a MDX value expression upon which the expression operates.

isolation level

The property of a transaction that controls the degree to which data is isolated for use by one process,

and is guarded against interference from other processes. Setting the isolation level defines the default

locking behavior for all SELECT statements in your SQL Server session.



67

item-level role assignment

A security policy that applies to an item in the report server folder namespace.

item-level role definition

A security template that defines a role used to control access to or interaction with an item in the report

server folder namespace.

key

A column or group of columns that uniquely identifies a row (primary key), defines the relationship

between two tables (foreign key) or is used to build an index.

key attribute

The attribute of a dimension that links the non-key attributes in the dimension to related measures.

key column

In an Analysis Services dimension, an attribute property that uniquely identifies the attribute members.

In an Analysis Services mining model, a data mining column that uniquely identifies each case in a case

table.

key performance indicator

A quantifiable, standardised metric that reflects a critical business variable (for instance, market share),

measured over time.

KPI

See Other Term: key performance indicator

latency

The amount of time that elapses when a data change is completed at one server and when that change

appears at another server.

leaf

In a tree structure, an element that has no subordinate elements.

leaf level

The bottom level of a clustered or nonclustered index.



68

leaf member

A dimension member without descendants.

level

The name of a set of members in a dimension hierarchy such that all members of the set are at the same

distance from the root of the hierarchy.

lift chart

In Analysis Services, a chart that compares the accuracy of the predictions of each data mining model in

the comparison set.

linked dimension

In Analysis Services, a reference in a cube to a dimension in a different cube.

linked measure group

In Analysis Services, a reference in a cube to a measure group in a different cube.

linked report

A report that references an existing report definition by using a different set of parameter values or

properties.

list data region

A report item on a report layout that displays data in a list format.

local cube

A cube created and stored with the extension .cub on a local computer using PivotTable Service.

lookup table

In Integration Services, a reference table for comparing, matching or extracting data.

many-to-many dimension

A relationship between a dimension and a measure group in which a single fact may be associated withmany dimension members and a single dimension member may be associated with a many facts.

matrix data region

A report item on a report layout that displays data in a variable columnar format.



69

measure

In a cube, a set of values that are usually numeric and are based on a column in the fact table of the

cube. Measures are the central values that are aggregated and analyzed.

measure group

All the measures in a cube that derive from a single fact table in a data source view.

member

An item in a dimension representing one or more occurrences of data.

member property

Information about an attribute member, for example, the gender of a customer member or the color of

a product member.

mining structure

A data mining object that defines the data domain from which the mining models are built.

multidimensional OLAP

A storage mode that uses a proprietary multidimensional structure to store a partition's facts and

aggregations or a dimension.

multidimensional structure

A database paradigm that treats data as cubes that contain dimensions and measures in cells.

MDX

A syntax used for defining multidimensional objects and querying and manipulating multidimensional

data.

Mining Model

An object that contains the definition of a data mining process and the results of the training activity.

Multidimensional Expression

A syntax used for defining multidimensional objects and querying and manipulating multidimensional

data.

named set

A set of dimension members or a set expression that is created for reuse, for example, in MDX queries.



70

natural hierarchy

A hierarchy in which at every level there is a one-to-many relationship between members in that level

and members in the next lower level.

nested table

A data mining model configuration in which a column of a table contains a table.

nonleaf

In a tree structure, an element that has one or more subordinate elements. In Analysis Services, a

dimension member that has one or more descendants. In SQL Server indexes, an intermediate index

node that points to other intermediate nodes or leaf nodes.

nonleaf member

A member with one or more descendants.

normalization rules

A set of database design rules that minimizes data redundancy and results in a database in which the

Database Engine and application software can easily enforce integrity.

Non-scalable EM

A Microsoft Clustering algorithm method that uses a probabilistic method to determine the probability

that a data point exists in a cluster.

Non-scalable K-means

A Microsoft Clustering algorithm method that uses a distance measure to assign a data point to its

closest cluster.

object identifier

A unique name given to an object.

In Metadata Services, a unique identifier constructed from a globally unique identifier (GUID) and an

internal identifier.

online analytical processing

A technology that uses multidimensional structures to provide rapid access to data for analysis.

online transaction processing

A data processing system designed to record all of the business transactions of an organization as they

occur. An OLTP system is characterized by many concurrent users actively adding and modifying data.



71

overfitting

The characteristic of some data mining algorithms that assigns importance to random variations in data

by viewing them as important patterns.

ODBC data source

The location of a set of data that can be accessed using an ODBC driver.

A stored definition that contains all of the connection information an ODBC application requires to

connect to the data source.

ODBC driver

A dynamic-link library (DLL) that an ODBC-enabled application, such as Excel, can use to access an ODBC

data source.

OLAP

See Other Term: online analytical processing

OLE DB

A COM-based API for accessing data. OLE DB supports accessing data stored in any format for which an

OLE DB provider is available.

OLE DB for OLAP

Formerly, the separate specification that addressed OLAP extensions to OLE DB. Beginning with OLE DB

2.0, OLAP extensions are incorporated into the OLE DB specification.

package

A collection of control flow and data flow elements that runs as a unit.

padding

A string, typically added when the last plaintext block is short.

The space allotted in a cell to create or maintain a specific size.

parameterized report

A published report that accepts input values through parameters.

parent

A member in the next higher level in a hierarchy that is directly related to the current member.



72

partition

In replication, a subset of rows from a published table, created with a static row filter or a

parameterized row filter.

In Analysis Services, one of the storage containers for data and aggregations of a cube. Every cube

contains one or more partitions. For a cube with multiple partitions, each partition can be stored

separately in a different physical location. Each partition can be based on a different data source.

Partitions are not visible to users; the cube appears to be a single object.

In the Database Engine, a unit of a partitioned table or index.

partition function

A function that defines how the rows of a partitioned table or index are spread across a set of partitions

based on the values of certain columns, called partitioning columns.

partition scheme

A database object that maps the partitions of a partition function to a set of filegroups.

partitioned index

An index built on a partition scheme, and whose data is horizontally divided into units which may be

spread across more than one filegroup in a database.

partitioned snapshot

In merge replication, a snapshot that includes only the data from a single partition.

partitioned table

A table built on a partition scheme, and whose data is horizontally divided into units which may be

spread across more than one filegroup in a database.

partitioning

The process of replacing a table with multiple smaller tables.

partitioning column

The column of a table or index that a partition function uses to partition a table or index.

perspective

A user-defined subset of a cube.



73

pivot

To rotate rows to columns, and columns to rows, in a crosstabular data browser.

To choose dimensions from the set of available dimensions in a multidimensional data structure for

display in the rows and columns of a crosstabular structure.

polling query

A polling query is typically a singleton query that returns a value Analysis Services can use to determine

if changes have been made to a table or other relational object.

precedence constraint

A control flow element that connects tasks and containers into a sequenced workflow.

predictable column

A data mining column that the algorithm will build a model around based on values of the input

columns.

prediction

A data mining technique that analyzes existing data and uses the results to predict values of attributes

for new records or missing attributes in existing records.

proactive caching

A system that manages data obsolescence in a cube by which objects in MOLAP storage are

automatically updated and processed in cache while queries are redirected to ROLAP storage.

process

In a cube, to populate a cube with data and aggregations.

In a data mining model, to populate a data mining model with data mining content.

profit chart

In Analysis Services, a chart that displays the theoretical increase in profit that is associated with using

each model.

properties page

A dialog box that displays information about an object in the interface.



74

property

A named attribute of a control, field or database object that you set to define one of the object's

characteristics, such as size, color or screen location; or an aspect of its behavior, such as whether it is

hidden.

property mapping

A mapping between a variable and a property of a package element.

property page

A tabbed dialog box where you can identify the characteristics of tables, relationships, indexes,

constraints and keys.

protection

In Integration Services, determines the protection method, the password or user key and the scope of package protection.

ragged hierarchy

See Other Term: unbalanced hierarchy

raw file

In Integration Services, a native format for fast reading and writing of data to files.

recursive hierarchy

A hierarchy of data in which all parent-child relationships are represented in the data.

reference dimension

A relationship between a dimension and a measure group in which the dimension is coupled to the

measure group through another dimension. This behaves like a snowflake dimension, except that

attributes are not shared between the two dimensions.

reference table

The source table to use in fuzzy lookups.

refresh data

The series of operations that clears data from a cube, loads the cube with new data from the data

warehouse and calculates aggregations.



75

relational database

A database or database management system that stores information in tables as rows and columns of

data, and conducts searches by using the data in specified columns of one table to find additional data

in another table.

relational database management system

A system that organizes data into related rows and columns.

relational OLAP

A storage mode that uses tables in a relational database to store multidimensional structures.

rendered report

A fully processed report that contains both data and layout information, in a format suitable for viewing.

rendering

A component in Reporting Services that is used to process the output format of a report.

rendering extension(s)

A plug-in that renders reports to a specific format.

rendering object model

Report object model used by rendering extensions.

replay

In SQL Server Profiler, the ability to open a saved trace and play it again.

report definition

The blueprint for a report before the report is processed or rendered. A report definition contains

information about the query and layout for the report.

report execution snapshot

A report snapshot that is cached.

report history

A collection of report snapshots that are created and saved over time.



76

report history snapshot

A report snapshot that appears in report history.

report intermediate format

A static report history that contains data captured at a specific point in time.

report item

Any object, such as a text box, graphical element or data region, that exists on a report layout.

report layout

In report designer, the placement of fields, text and graphics within a report.

In report builder, the placement of fields and entities within a report, plus applied formatting styles.

report layout template

A predesigned table, matrix or chart report template in report builder.

report link

A URL to a hyperlinked report.

report model

A metadata description of business data used for creating ad hoc reports in report builder.

report processing extension

A component in Reporting Services that is used to extend the report processing logic.

report rendering

The action of combining the report layout with the data from the data source for the purpose of viewing

the report.

report server database

A database that provides internal storage for a report server.

report server execution account

The account under which the Report Server Web service and Report Server Windows service run.



77

report server folder namespace

A hierarchy that contains predefined and user-defined folders. The namespace uniquely identifies

reports and other items that are stored in a report server. It provides an addressing scheme for

specifying reports in a URL.

report snapshot

A static report that contains data captured at a specific point in time.

report-specific schedule

Schedule defined inline with a report.

resource

Any item in a report server database that is not a report, folder or shared data source item.

role

A SQL Server security account that is a collection of other security accounts that can be treated as a

single unit when managing permissions. A role can contain SQL Server logins, other roles, and Windows

logins or groups.

In Analysis Services, a role uses Windows security accounts to limit scope of access and permissions

when users access databases, cubes, dimensions and data mining models.

In a database mirroring session, the principal server and mirror server perform complementary principal

and mirror roles. Optionally, the role of witness is performed by a third server instance.

role assignment

Definition of user access rights to an item.

In Reporting Services, a security policy that determines whether a user or group can access a specific

item and perform an operation.

role definition

A collection of tasks performed by a user (i.e. browser, administrator).

In Reporting Services, a named collection of tasks that defines the operations a user can perform on a

report server.

roleplaying dimension

A single database dimension joined to the fact table on different foreign keys to produce multiple cube

dimensions.



78

RDBMS

See Other Term: relational database management system

RDL

See Other Term: Report Definition Language

Report Definition Language

A set of instructions that describe layout and query information for a report.

Report Server service

A Windows service that contains all the processing and management capabilities of a report server.

Report Server Web service

A Web service that hosts, processes and delivers reports.

ReportViewer controls

A Web server control and Windows Form control that provides embedded report processing in ASP.NET

and Windows Forms applications.

scalar

A single-value field, as opposed to an aggregate.

scalar aggregate

An aggregate function, such as MIN(), MAX() or AVG(), that is specified in a SELECT statement column list

that contains only aggregate functions.

scale bar

The line on a linear gauge on which tick marks are drawn analogous to an axis on a chart.

scope

An extent to which a variable can be referenced in a DTS package.

script

A collection of Transact-SQL statements used to perform an operation.



79

security extension

A component in Reporting Services that authenticates a user or group to a report server.

semiadditive

A measure that can be summed along one or more, but not all, dimensions in a cube.

serializable

The highest transaction isolation level. Serializable transactions lock all rows they read or modify to

ensure the transaction is completely isolated from other tasks.

server

A location on the network where report builder is launched from and a report is saved, managed and

published.

server admin

A user with elevated privileges who can access all settings and content of a report server.

server aggregate

An aggregate value that is calculated on the data source server and included in a result set by the data

provider.

shared data source item

Data source connection information that is encapsulated in an item.

shared dimension

A dimension created within a database that can be used by any cube in the database.

shared schedule

Schedule information that can be referenced by multiple items.

sibling

A member in a dimension hierarchy that is a child of the same parent as a specified member.

slice

A subset of the data in a cube, specified by limiting one or more dimensions by members of the

dimension.



80

smart tag

A smart tag exposes key configurations directly on the design surface to enhance overall design-time

productivity in Visual Studio 2005.

snowflake schema

An extension of a star schema such that one or more dimensions are defined by multiple tables.

source

An Integration Services data flow component that extracts data from a data store, such as files and

databases.

source control

A way of storing and managing different versions of source code files and other files used in software

development projects. Also known as configuration management and revision control.

source cube

The cube on which a linked cube is based.

source database

In data warehousing, the database from which data is extracted for use in the data warehouse.

A database on the Publisher from which data and database objects are marked for replication as part of

a publication that is propagated to Subscribers.

source object

The single object to which all objects in a particular collection are connected by way of relationships that

are all of the same relationship type.

source partition

An Analysis Services partition that is merged into another and is deleted automatically at the end of the

merger process.

sparsity

The relative percentage of a multidimensional structure's cells that do not contain data.

star join

A join between a fact table (typically a large fact table) and at least two dimension tables.



81

star query

A star query joins a fact table and a number of dimension tables.

star schema

A relational database structure in which data is maintained in a single fact table at the center of the

schema with additional dimension data stored in dimension tables.

subreport

A report contained within another report.

subscribing server

A server running an instance of Analysis Services that stores a linked cube.

subscription

A request for a copy of a publication to be delivered to a Subscriber.

subscription database

A database at the Subscriber that receives data and database objects published by a Publisher.

subscription event rule

A rule that processes information for event-driven subscriptions.

subscription scheduled rule

One or more Transact-SQL statements that process information for scheduled subscriptions.

Secure Sockets Layer (SSL)

A proposed open standard for establishing a secure communications channel to prevent the

interception of critical information, such as credit card numbers. Primarily, it enables secure electronic

financial transactions on the World Wide Web, although it is designed to work on other Internet services

as well.

Semantic Model Definition Language

A set of instructions that describe layout and query information for reports created in report builder.

Sequence container

Defines a control flow that is a subset of the package control flow.



82

table data region

A report item on a report layout that displays data in a columnar format.

tablix

A Reporting Services RDL data region that contains rows and columns resembling a table or matrix,

possibly sharing characteristics of both.

target partition

An Analysis Services partition into which another is merged, and which contains the data of both

partitions after the merger.

temporary stored procedure

A procedure placed in the temporary database, tempdb and erased at the end of the session.

time dimension

A dimension that breaks time down into levels such as Year, Quarter, Month and Day.

In Analysis Services, a special type of dimension created from a date/time column.

transformation

In data warehousing, the process of changing data extracted from source data systems into

arrangements and formats consistent with the schema of the data warehouse.

In Integration Services, a data flow component that aggregates, merges, distributes and modifiescolumn data and rowsets.

transformation error output

Information about a transformation error.

transformation input

Data that is contained in a column, which is used during a join or lookup process, to modify or aggregate

data in the table to which it is joined.

transformation output

Data that is returned as a result of a transformation procedure.

tuple

Uniquely identifies a cell, based on a combination of attribute members from every attribute hierarchy

in the cube.



83

two

A process that ensures transactions that apply to more than one server are completed on all servers or

on none.

unbalanced hierarchy

A hierarchy in which one or more levels do not contain members in one or more branches of the

hierarchy.

unknown member

A member of a dimension for which no key is found during processing of a cube that contains the

dimension.

unpivot

In Integration Services, the process of creating a more normalized dataset by expanding data columns ina single record into multiple records.

value

An expression in MDX that returns a value. Value expressions can operate on sets, tuples, members,

levels, numbers or strings.

variable interval

An option on a Reporting Services chart that can be specified to automatically calculate the optimal

number of labels that can be placed on an axis, based on the chart width or height.

vertical partitioning

To segment a single table into multiple tables based on selected columns.

very large database

A database that has become large enough to be a management challenge, requiring extra attention to

people, processes and processes.

visual

A displayed, aggregated cell value for a dimension member that is consistent with the displayed cell

values for its displayed children.

VLDB

very large database.



write back

To update a cube cell value, member or member property value.

write enable

To change a cube or dimension so that users in cube roles with read/write access to the cube or

dimension can change its data.

writeback

In SQL Server, the update of a cube cell value, member or member property value.

Web service

In Reporting Services, a service that uses Simple Object Access Protocol (SOAP) over HTTP and acts as a

communications interface between client programs and the report server.

XML for Analysis

A specification that describes an open standard that supports data access to data sources that reside on

the World Wide Web.

XMLA

See Other Term: XML for Analysis

Course 10058 - Implementing Data Flow in SQL Server Integration Services 2008

Documents

Transcript of Course 10058 - Implementing Data Flow in SQL Server Integration Services 2008