TSQL Coding Guidelines

50
Chris Adkin August 2011

description

T-Sql programming guidelines, in terms of:- 1. Commenting code 2. Code readability 3. General good practise 4. Defensive coding and error handling 5. Coding for performance and scalability

Transcript of TSQL Coding Guidelines

Page 1: TSQL Coding Guidelines

Chris Adkin

August 2011

Page 2: TSQL Coding Guidelines

Commenting code

Code readability

General best practise

Page 3: TSQL Coding Guidelines

Comments and exception handling have been purposely omitted from code fragments in the interest of brevity, such that each fragment can fit onto one slide.

Page 4: TSQL Coding Guidelines

All code should be self documenting.

T-SQL code artefacts, triggers, stored procedures and functions should have a standard comment banner.

Comment code at all points of interest, describe why and not what.

Avoid in line comments.

Page 5: TSQL Coding Guidelines

Comment banners should include:-

Author details.

A brief description of what the code does.

Narrative comments for all arguments.

Narrative comments for return types.

Change control information.

An example is provided on the next slide

Page 6: TSQL Coding Guidelines

CREATE PROCEDURE

/*===================================================================================*/

/* */

/* Name : */

uspMyProc

/* */

/* Description: Stored procedure to demonstrate what a specimen comment banner */

/* should look like. */

/* */

/* Parameters : */

/* */

(

@Parameter1 int, /* First parameter passed into procedure. */

/* --------------------------------------------------------------------------- */

@Parameter2 int /* Second parameter passed into procedure. */

)

/* */

/* Change History */

/* ~~~~~~~~~~~~~~ */

/* */

/* Version Author Date Ticket Description */

/* ------- -------------------- -------- ------ ------------------------------------ */

/* 1.0 C. J. Adkin 09/08/11 3525 Initial version created. */

/* */

/*===================================================================================*/

AS

BEGIN

.

.

Page 7: TSQL Coding Guidelines

-- This is an example of an inline comment

Why are these bad ?

Because a careless backspace can turn a useful statement into a commented out one.

But my code is always thoroughly tested

NO EXCSUSE, always code defensively

Use /* */ comments instead.

Page 8: TSQL Coding Guidelines

Use and adhere to naming conventions.

Use meaningful object names.

Never prefix application stored procedures with sp

SQL Server will always scan through the system catalogue first, before executing such procedures

Bad for performance

Page 9: TSQL Coding Guidelines

Use ANSI SQL join syntax over none ANSI syntax.

Be consistent when using case:-

Camel case

Pascal case

Use of upper case for reserved key words

Be consistent when indenting and stacking text.

Page 10: TSQL Coding Guidelines

BEST PRACTICES

Page 11: TSQL Coding Guidelines

Never blindly take technical hints and tips written in a blog or presentation as gospel.

Test your assumptions using “Scientific method”, i.e.:-

Use test cases which use consistent test data across all tests, production realistic data is preferable.

If the data is commercially sensitive, e.g. bank account details, keep the volume and distribution the same, obfuscate the sensitive parts out.

Only change one thing at a time, so as to be able to gauge the impact of the change accurately and know what effected the change.

Page 12: TSQL Coding Guidelines

The “Scientific Method” Approach

• For performance related tests always clear the procedure and buffer cache out, so that results are not skewed between tests, use the following:-

– CHECKPOINT

– DBCC FREEPROCCACHE

– DBCC DROPCLEANBUFFERS

Page 13: TSQL Coding Guidelines

This is furnishing the code with a facility to allow its execution to be traced.

Write to a tracking table

And / or use xp_logevent to write to event log

DO NOT make the code a “Black box” which has to be dissected statement by statement in production if it starts to fail.

Page 14: TSQL Coding Guidelines

A term coined by Jeff Moden, a MVP and frequent poster on SQL Server Central.com .

Alludes to:-

Coding in procedural 3GL way instead of a set based way.

Chronic performance of row by row oriented processing.

Abbreviated to RBAR, pronounced Ree-bar.

Page 15: TSQL Coding Guidelines

Code whereby result sets and table contents are processed line by line, typically using cursors.

Correlated subqueries.

User Defined Functions.

Iterating through results sets as ADO objects in SQL Server Integration Services looping containers.

Page 16: TSQL Coding Guidelines

A simple, but contrived query written against the AdventureWorkds2008R2 database.

The first query will use nested subqueries.

The second will use derived tables.

Page 17: TSQL Coding Guidelines

SELECT ProductID,

Quantity

FROM AdventureWorks.Production.ProductInventory Pi

WHERE LocationID = (SELECT TOP 1

LocationID

FROM AdventureWorks.Production.Location Loc

WHERE Pi.LocationID = Loc.LocationID

AND CostRate = (SELECT MAX(CostRate)

FROM AdventureWorks.Production.Location) )

Page 18: TSQL Coding Guidelines

SELECT ProductID,

Quantity

FROM (SELECT TOP 1

LocationID

FROM AdventureWorks.Production.Location Loc

WHERE CostRate = (SELECT MAX(CostRate)

FROM AdventureWorks.Production.Location) ) dt,

AdventureWorks.Production.ProductInventory Pi

WHERE Pi.LocationID = dt.LocationID

Page 19: TSQL Coding Guidelines

What is the difference between the two queries ?.

Query 1, cost = 0.299164

Query 2, cost = 0.0202938

What is the crucial difference ?

Table spool operation in the first plan has been executed 1069 times.

This happens to be the number of rows in the ProductInventory table.

Page 20: TSQL Coding Guidelines

Row oriented processing may be unavoidable under certain circumstances:-

The processing of one row depends on the state of one or more previous rows in a result set.

The row processing logic involves a change to the global state of the database and therefore cannot be encapsulated in a function.

In this case there are ways to use cursors in a very efficient manner

As per the next three slides.

Page 21: TSQL Coding Guidelines

Elapsed time 00:22:27.892

DECLARE @MaxRownum int,

@OrderId int,

@i int;

SET @i = 1;

CREATE TABLE #OrderIds (

rownum int IDENTITY (1, 1),

OrderId int

);

INSERT INTO #OrderIds

SELECT SalesOrderID

FROM Sales.SalesOrderDetail;

SELECT @MaxRownum = MAX(rownum)

FROM #OrderIds;

WHILE @i < @MaxRownum

BEGIN

SELECT @OrderId = OrderId

FROM #OrderIds

WHERE rownum = @i;

SET @i = @i + 1;

END;

Page 22: TSQL Coding Guidelines

Elapsed time 00:00:03.106

DECLARE @s int;

DECLARE c CURSOR FOR

SELECT SalesOrderID

FROM Sales.SalesOrderDetail;

OPEN c;

FETCH NEXT FROM c INTO @s;

WHILE @@FETCH_STATUS = 0

BEGIN

FETCH NEXT FROM c INTO @s;

END;

CLOSE c;

DEALLOCATE c;

Page 23: TSQL Coding Guidelines

Elapsed time 00:00:01.555 DECLARE @s int;

DECLARE c CURSOR FAST_FORWARD FOR

SELECT SalesOrderID

FROM Sales.SalesOrderDetail;

OPEN c;

FETCH NEXT FROM c INTO @s;

WHILE @@FETCH_STATUS = 0

BEGIN

FETCH NEXT FROM c INTO @s;

END;

CLOSE c;

DEALLOCATE c;

Page 24: TSQL Coding Guidelines

No T-SQL language feature is a “Panacea to all ills”.

For example:-

Avoid RBAR logic where possible

Avoid nesting cursors

But cursors do have their uses.

Be aware of the FAST_FORWARD optimisation, applicable when:-

The data being retrieved is not being modified

The cursor is being scrolled through in a forward only direction

Page 25: TSQL Coding Guidelines

When using SQL Server 2005 onwards:-

Use TRY CATCH blocks.

Make the event logged in CATCH block verbose enough to allow the exceptional event to be easily tracked down.

NEVER use exceptions for control flow, illustrated with an upsert example in the next four slides.

NEVER ‘Swallow’ exceptions, i.e. catch them and do nothing with them.

Page 26: TSQL Coding Guidelines

DECLARE @p int;

DECLARE c CURSOR FAST_FORWARD FOR

SELECT ProductID

FROM Sales.SalesOrderDetail;

OPEN c;

FETCH NEXT FROM c INTO @p;

WHILE @@FETCH_STATUS = 0

BEGIN

FETCH NEXT FROM c INTO @p;

/* Place the stored procedure to be tested

* on the line below.

*/

EXEC dbo.uspUpsert_V1 @p;

END;

CLOSE c;

DEALLOCATE c;

Page 27: TSQL Coding Guidelines

CREATE TABLE SalesByProduct (

ProductID int,

Sold int,

CONSTRAINT [PK_SalesByProduct] PRIMARY KEY CLUSTERED

(

ProductID

) ON [USERDATA]

) ON [USERDATA]

Page 28: TSQL Coding Guidelines

Execution time = 00:00:51.200

CREATE PROCEDURE uspUpsert_V1 (@ProductID int) AS

BEGIN

SET NOCOUNT ON;

BEGIN TRY

INSERT INTO SalesByProduct

VALUES (@ProductID, 1);

END TRY

BEGIN CATCH

IF ERROR_NUMBER() = 2627

BEGIN

UPDATE SalesByProduct

SET Sold += 1

WHERE ProductID = @ProductID;

END

END CATCH;

END;

Page 29: TSQL Coding Guidelines

Execution time = 00:00:20.080

CREATE PROCEDURE uspUpsert_V2 (@ProductID int) AS

BEGIN

SET NOCOUNT ON;

UPDATE SalesByProduct

SET Sold += 1

WHERE ProductID = @ProductID;

IF @@ROWCOUNT = 0

BEGIN

INSERT INTO SalesByProduct

VALUES (@ProductID, 1);

END;

END;

Page 30: TSQL Coding Guidelines

With SQL Server 2008 onwards, consider using the MERGE statement for upserts, execution time = 00:00:20.904

CREATE PROCEDURE uspUpsert_V3 (@ProductID int) AS

BEGIN

SET NOCOUNT ON;

MERGE SalesByProduct AS target

USING (SELECT @ProductID)

AS

source (ProductID)

ON (target.ProductID = source.ProductID)

WHEN MATCHED THEN

UPDATE

SET Sold += 1

WHEN NOT MATCHED THEN

INSERT (ProductID, Sold)

VALUES (source.ProductID, 1);

END;

Page 31: TSQL Coding Guidelines

Understand and use the full power of T-SQL.

Most people know how to UNION results sets together, but do not know about INTERSECT and EXCEPT.

Also a lot of development effort can be saved by using T-SQL’s analytics extensions where appropriate:-

RANK()

DENSE_RANK()

NTILE()

ROW_NUMBER()

LEAD() and LAG() (introduced in Denali)

Page 32: TSQL Coding Guidelines

Scalar functions are another example of RBAR, consider this function:-

CREATE FUNCTION udfMinProductQty ( @ProductID int )

RETURNS int

AS

BEGIN

RETURN ( SELECT MIN(OrderQty)

FROM Sales.SalesOrderDetail

WHERE ProductId = @ProductID )

END;

Page 33: TSQL Coding Guidelines

Now lets call the function from an example query:-

SELECT ProductId,

dbo.udfMinProductQty(ProductId)

FROM Production.Product

Elapsed time = 00:00:00.746

Page 34: TSQL Coding Guidelines

Now doing the same thing, but using an inline table valued function:-

CREATE FUNCTION tvfMinProductQty (

@ProductId INT

)

RETURNS TABLE

AS

RETURN (

SELECT MAX(s.OrderQty) AS MinOrdQty

FROM Sales.SalesOrderDetail s

WHERE s.ProductId = @ProductId

)

Page 35: TSQL Coding Guidelines

Invoking the inline TVF from a query:-

SELECT ProductId,

(SELECT MinOrdQty

FROM dbo.tvfMinProductQty(ProductId) ) MinOrdQty

FROM Production.Product

ORDER BY ProductId

Elapsed time 00:00:00.330

Page 36: TSQL Coding Guidelines

Leverage functionality already in SQL Server, never reinvent it, this will lead to:-

More robust code

Less development effort

Potentially faster code

Code with better readability

Easier to maintain code

Page 37: TSQL Coding Guidelines

A scenario that actually happened:-

A row is inserted into the customer table

Customer table has a primary key based on an identity column

@@IDENTITY is used to obtain the key value of the customer row inserted for the creation of an order row with a foreign key linking back to customer.

The identity value obtained is nothing like the one for the inserted row – why ?

Page 38: TSQL Coding Guidelines

@@IDENTITY obtains the latest identity value irrespective of the session it came from.

In the example the replication merge agent inserted a row in the customer table just before @@IDENTITY was used.

The solution: always use SCOPE_IDENTITY() instead of @@IDENTITY.

Page 39: TSQL Coding Guidelines

Developing applications that use database and perform well depends on good:-

Schema design

Compiled statement plan reuse.

Connection management.

Minimizing the number of network round trips between the database and the tier above.

Page 40: TSQL Coding Guidelines

Parameterise your queries in order to minimize compiling.

BUT, watch out for “Parameter sniffing”.

At runtime the database engine will sniff the values of the parameters a query is compiled with and create a plan accordingly.

Unfortunate when the values cause plans with table scans, when the ‘Popular’ values lead to plans with index seeks.

Page 41: TSQL Coding Guidelines

Use the RECOMPILE hint to force the creation of a new plan.

Use the optimise for hint in order for a plan to be created for ‘Popular’ values you specify.

Use the OPTIMISE FOR UNKNOWN hint, to cause a “General purpose” plan to be created.

Copy parameters passed into a stored procedure to local variables and use those in your query.

Page 42: TSQL Coding Guidelines

For OLTP style applications:-

Transactions will be short

Number of statements will be finite

SQL will only affect a few rows for each execution.

The SQL will be simple.

Plans will be skewed towards using index seeks over table scans.

Recompiles could double+ query execution time.

Therefore recompiles are undesirable for OLTP applications.

Page 43: TSQL Coding Guidelines

For OLAP style applications:- Complex queries that may involve aggregation and analytic

SQL. Queries may change constantly due to the use of reporting

and BI tools. May involve WHERE clauses with potentially lots of

combinations of parameters. Foregoing a recompile via OTPION(RECOMPILE) may be

worth taking a hit on for the benefit of a significant reduction in total execution time.

This is the exception to the rule.

Page 44: TSQL Coding Guidelines

Be careful when using table variables.

Statistics cannot be gathered on these

The optimizer will always assume they only contain one row.

This can lead to unexpected execution plans.

Table variables will always inhibit parallelism in execution plans.

Page 45: TSQL Coding Guidelines

This applies to conditions in WHERE clauses.

If a WHERE clause condition can use an index, this is said to be ‘Sargable’

A searchable argument

As a general rule of thumb the use of a function on a column will suppress index usage.

i.e. WHERE ufn(MyColumn1) = <somevalue>

Page 46: TSQL Coding Guidelines

Constructs that will always force a serial plan:-

All T-SQL user defined functions.

All CLR user defined functions with data access.

Built in function including: @@TRANCOUNT, ERROR_NUMBER() and OBJECT_ID().

Dynamic cursors.

Page 47: TSQL Coding Guidelines

Constructs that will always force a serial region within a plan:- Table value functions TOP Recursive queries Multi consumer spool Sequence functions System table scans “Backwards” scans Sequence functions Global scalar aggregate

Page 48: TSQL Coding Guidelines

Make stored procedures and functions relatively single minded in what they do.

Stored procedures and functions with lots of arguments are a “Code smell” of code that:-

Is difficult to unit test with a high degree of confidence.

Does not lend itself to code reuse.

Smacks of poor design.

Page 49: TSQL Coding Guidelines

An ‘Ordinal’ in the context of the ORDER BY clause is when numbers are used to represent column positions.

If the new columns are added or their order changed in the SELECT, this query will return different results, potentially breaking the application using it.

SELECT TOP 5

[SalesOrderNumber]

,[OrderDate]

,[DueDate]

,[ShipDate]

,[Status]

FROM [AdventureWorks].[Sales].[SalesOrderHeader]

ORDER BY 2 DESC

Page 50: TSQL Coding Guidelines

SELECT * retrieves all columns from a table

bad for performance if only a subset of these is required.

Using columns by their names explicitly leads to improved code readability.

Code is easier to maintain, as it enables the “Developer” to see in situ what columns a query is using.