Vinod Kumar M Technology Evangelist Microsoft www...

Post on 11-Jul-2020

2 views 0 download

Transcript of Vinod Kumar M Technology Evangelist Microsoft www...

Vinod Kumar MTechnology Evangelist – DB and BIMicrosoftwww.ExtremeExperts.com

Objectives and Takeaways

A high level viewDesign considerations

How to measure performance

Performance implications of architecture

Manageability aspects of SSIS

Deployment tips

Out of scopePrescriptive guidance for specific situations

Agenda

Quick Introduction

Understanding Buffers and Memory

OVAL Concept Detailed

Component Specific Notes

Manageability Features

Deployment Considerations

Introduction

SSIS Life Cycle tools

Design the SSIS PackageBusiness Intelligence Studio (visual Studio)

Migration wizard for pre SQL 2005 packages

Version Control Integration (VSS)

Deployment/ExecutionDeployment Utility to copy packages

Command Line execution (dtexec.exe and dtexecui.exe)

Flexible Configuration Options

SupportabilityRich per package Logging

SQL Management Studio for monitoring running packages and organizing stored packages

Checkpoint - Restartability

Deep dive - Performance

Buffers and Memory

Buffers based on design time metadataThe width of a row determines the size of the buffer

Smaller rows = more rows in memory = greater efficiency

Memory copies are expensive!A buffer might have placeholder columns filled by downstream components

Pointer magic where possible

Component Types

Logically works at a row level

Buffer Reused

Data Convert, Derived Column

Row based(synchronousoutputs)

Partially Blocking(asynchronousoutputs)

Blocking(asynchronousoutputs)

May logically work at a row level

Data copied to new buffers

Merge, Merge Join, Union All

Needs all input buffers before producing any output rows

Data copied to new buffers

Aggregate, Sort

CPU Utilization

Execution TreeStarts from a source or an async output

Ends at a destination or an input that has no sync outputs

Each Execution Tree can get a worker thread

MaxEngineThreads to control parallelism

Performance Strategy

Use OVAL to identify the factors affecting data integration performance…

Operations

Which app is best suited to these operations on this volume of data? For example, use SQL Server or SSIS for sorting data?

Volume

Application

Location

How much data must be processed?

What logic should be applied to the data?

Where should the app run? For example, on a shared server, or on a standalone machine?

An OVAL Example—Loading a Text File

Simple scenario…

Interesting performance considerations!

Text file on Server 1 SQL Server on Server 2

Understand all operations performed

Operations

Beware of hidden operationsData conversion in either step 3 or 4

1. Open a transaction on SQL Server

2. Read data from the text file

3. Load data into the SSIS data flow

4. Load the data into SQL Server

5. Commit the transaction

Volume

Reduce where possible

Don’t push unneeded columns

Conditional split for filtering rows

Do not parse or convert columns unnecessarily

In a fixed-width format you can combine adjacent unneeded columns into one

Leave unneeded columns as strings

Application

Is SSIS right for this?

Overhead of starting up an SSIS package may offset any performance gain over BCP for small data sets.

Is BCP good enough?

Is the greater manageability and control of SSIS needed?

Bulk Import Task vs. Data Flow

Location

Consider the following configuration …

Text file on Server 1 SQL Server on Server 2

Where should SSIS run?

(Licensing issues aside)

Measuring Performance

OVAL does not provide prescriptive guidance

Too many variables

Improve performance by applying OVAL and measuring

SSIS Logging

Performance counters

SQL Server Profiler

For extract queries, lookups and loading

ParallelismFocus on critical path

Utilize available resources

Memory Constrained Reader and CPU Constrained

Let it rip! Optimize the slowest

Moving Ahead

Manageability Features

Logging and Log Providers

Checkpoint Restartability

Precedence Constraints

Configurations

SSIS Service

CheckpointingCheckpoint File Created

Write Checkpoint

Write Checkpoint

Write Checkpoint

Checkpoint File deleted

Package Loads

Package Completes

Data Flow Task

Data Flow Task

Send Mail Task

Configuration Scenario

Dev DB

Multiple Configurations

DevTest Production

Test DBProd DB

Machines where packages are being designed /tested /executed

Configuration updates package on load with DB locations (and mail server, file share locations….)

Package Handoff

Precedence constraints

Directs Flow from object to object…

Basically, ‘when do I move on’

Success, Failure, Completion or one of those plus an expression (condition)

Dataflow Task

SendMail Task

Success

Completion

Failure

Success & expression

Tackle the basics …

Deployment Flow

Tools to organize and ‘copy’ packages and supporting files

•Design Package•Add Configurations•Add Miscellaneous files•Set Project Deployment properties•Build

•Choose Destination (SQL File System) •Modify protection level•Choose location of supporting files•Change configurations•Execute Installation Wizard

Bi Studio

•Copy/Move Deployment folder\files User

•Create desired agent jobs SQL Agent

•Copy/Move Deployment folder\files User

SQL Management Studio

Utilizes the SSIS service

Allows Monitoring of currently Executing packages

Maintain stored package structure

Ad hoc Package execution

Simple flow …

SSIS: SummaryFast !

Data flows process large volumes of data efficiently - even through complex operationsExceptional price / performance on multi-core

Feature RichMany pre-built adapters and transformations reduce hand coding

Extensible object model enables specialized custom or scripted components

Highly productive visual environment speeds development and debugging

Integral part of a complete BI stack (IS-AS-RS)

Beyond ETLEnables integration of XML, RSS and Web Services data

Data cleansing features enable “difficult” data to be handled during loading

Data and Text mining allow “smart” handling of data for imputation of incomplete data, conditional processing of potential problems, or smart escalation of issues such as fraud detection

Your Feedbackis Important!

Please Fill Out the feedback form

Questions !!!

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.