TSQL Coding Guidelines
-
Upload
chris-adkin -
Category
Technology
-
view
4.347 -
download
5
description
Transcript of TSQL Coding Guidelines
Chris Adkin
August 2011
Commenting code
Code readability
General best practise
Comments and exception handling have been purposely omitted from code fragments in the interest of brevity, such that each fragment can fit onto one slide.
All code should be self documenting.
T-SQL code artefacts, triggers, stored procedures and functions should have a standard comment banner.
Comment code at all points of interest, describe why and not what.
Avoid in line comments.
Comment banners should include:-
Author details.
A brief description of what the code does.
Narrative comments for all arguments.
Narrative comments for return types.
Change control information.
An example is provided on the next slide
CREATE PROCEDURE
/*===================================================================================*/
/* */
/* Name : */
uspMyProc
/* */
/* Description: Stored procedure to demonstrate what a specimen comment banner */
/* should look like. */
/* */
/* Parameters : */
/* */
(
@Parameter1 int, /* First parameter passed into procedure. */
/* --------------------------------------------------------------------------- */
@Parameter2 int /* Second parameter passed into procedure. */
)
/* */
/* Change History */
/* ~~~~~~~~~~~~~~ */
/* */
/* Version Author Date Ticket Description */
/* ------- -------------------- -------- ------ ------------------------------------ */
/* 1.0 C. J. Adkin 09/08/11 3525 Initial version created. */
/* */
/*===================================================================================*/
AS
BEGIN
.
.
-- This is an example of an inline comment
Why are these bad ?
Because a careless backspace can turn a useful statement into a commented out one.
But my code is always thoroughly tested
NO EXCSUSE, always code defensively
Use /* */ comments instead.
Use and adhere to naming conventions.
Use meaningful object names.
Never prefix application stored procedures with sp
SQL Server will always scan through the system catalogue first, before executing such procedures
Bad for performance
Use ANSI SQL join syntax over none ANSI syntax.
Be consistent when using case:-
Camel case
Pascal case
Use of upper case for reserved key words
Be consistent when indenting and stacking text.
BEST PRACTICES
Never blindly take technical hints and tips written in a blog or presentation as gospel.
Test your assumptions using “Scientific method”, i.e.:-
Use test cases which use consistent test data across all tests, production realistic data is preferable.
If the data is commercially sensitive, e.g. bank account details, keep the volume and distribution the same, obfuscate the sensitive parts out.
Only change one thing at a time, so as to be able to gauge the impact of the change accurately and know what effected the change.
The “Scientific Method” Approach
• For performance related tests always clear the procedure and buffer cache out, so that results are not skewed between tests, use the following:-
– CHECKPOINT
– DBCC FREEPROCCACHE
– DBCC DROPCLEANBUFFERS
This is furnishing the code with a facility to allow its execution to be traced.
Write to a tracking table
And / or use xp_logevent to write to event log
DO NOT make the code a “Black box” which has to be dissected statement by statement in production if it starts to fail.
A term coined by Jeff Moden, a MVP and frequent poster on SQL Server Central.com .
Alludes to:-
Coding in procedural 3GL way instead of a set based way.
Chronic performance of row by row oriented processing.
Abbreviated to RBAR, pronounced Ree-bar.
Code whereby result sets and table contents are processed line by line, typically using cursors.
Correlated subqueries.
User Defined Functions.
Iterating through results sets as ADO objects in SQL Server Integration Services looping containers.
A simple, but contrived query written against the AdventureWorkds2008R2 database.
The first query will use nested subqueries.
The second will use derived tables.
SELECT ProductID,
Quantity
FROM AdventureWorks.Production.ProductInventory Pi
WHERE LocationID = (SELECT TOP 1
LocationID
FROM AdventureWorks.Production.Location Loc
WHERE Pi.LocationID = Loc.LocationID
AND CostRate = (SELECT MAX(CostRate)
FROM AdventureWorks.Production.Location) )
SELECT ProductID,
Quantity
FROM (SELECT TOP 1
LocationID
FROM AdventureWorks.Production.Location Loc
WHERE CostRate = (SELECT MAX(CostRate)
FROM AdventureWorks.Production.Location) ) dt,
AdventureWorks.Production.ProductInventory Pi
WHERE Pi.LocationID = dt.LocationID
What is the difference between the two queries ?.
Query 1, cost = 0.299164
Query 2, cost = 0.0202938
What is the crucial difference ?
Table spool operation in the first plan has been executed 1069 times.
This happens to be the number of rows in the ProductInventory table.
Row oriented processing may be unavoidable under certain circumstances:-
The processing of one row depends on the state of one or more previous rows in a result set.
The row processing logic involves a change to the global state of the database and therefore cannot be encapsulated in a function.
In this case there are ways to use cursors in a very efficient manner
As per the next three slides.
Elapsed time 00:22:27.892
DECLARE @MaxRownum int,
@OrderId int,
@i int;
SET @i = 1;
CREATE TABLE #OrderIds (
rownum int IDENTITY (1, 1),
OrderId int
);
INSERT INTO #OrderIds
SELECT SalesOrderID
FROM Sales.SalesOrderDetail;
SELECT @MaxRownum = MAX(rownum)
FROM #OrderIds;
WHILE @i < @MaxRownum
BEGIN
SELECT @OrderId = OrderId
FROM #OrderIds
WHERE rownum = @i;
SET @i = @i + 1;
END;
Elapsed time 00:00:03.106
DECLARE @s int;
DECLARE c CURSOR FOR
SELECT SalesOrderID
FROM Sales.SalesOrderDetail;
OPEN c;
FETCH NEXT FROM c INTO @s;
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM c INTO @s;
END;
CLOSE c;
DEALLOCATE c;
Elapsed time 00:00:01.555 DECLARE @s int;
DECLARE c CURSOR FAST_FORWARD FOR
SELECT SalesOrderID
FROM Sales.SalesOrderDetail;
OPEN c;
FETCH NEXT FROM c INTO @s;
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM c INTO @s;
END;
CLOSE c;
DEALLOCATE c;
No T-SQL language feature is a “Panacea to all ills”.
For example:-
Avoid RBAR logic where possible
Avoid nesting cursors
But cursors do have their uses.
Be aware of the FAST_FORWARD optimisation, applicable when:-
The data being retrieved is not being modified
The cursor is being scrolled through in a forward only direction
When using SQL Server 2005 onwards:-
Use TRY CATCH blocks.
Make the event logged in CATCH block verbose enough to allow the exceptional event to be easily tracked down.
NEVER use exceptions for control flow, illustrated with an upsert example in the next four slides.
NEVER ‘Swallow’ exceptions, i.e. catch them and do nothing with them.
DECLARE @p int;
DECLARE c CURSOR FAST_FORWARD FOR
SELECT ProductID
FROM Sales.SalesOrderDetail;
OPEN c;
FETCH NEXT FROM c INTO @p;
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM c INTO @p;
/* Place the stored procedure to be tested
* on the line below.
*/
EXEC dbo.uspUpsert_V1 @p;
END;
CLOSE c;
DEALLOCATE c;
CREATE TABLE SalesByProduct (
ProductID int,
Sold int,
CONSTRAINT [PK_SalesByProduct] PRIMARY KEY CLUSTERED
(
ProductID
) ON [USERDATA]
) ON [USERDATA]
Execution time = 00:00:51.200
CREATE PROCEDURE uspUpsert_V1 (@ProductID int) AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY
INSERT INTO SalesByProduct
VALUES (@ProductID, 1);
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
BEGIN
UPDATE SalesByProduct
SET Sold += 1
WHERE ProductID = @ProductID;
END
END CATCH;
END;
Execution time = 00:00:20.080
CREATE PROCEDURE uspUpsert_V2 (@ProductID int) AS
BEGIN
SET NOCOUNT ON;
UPDATE SalesByProduct
SET Sold += 1
WHERE ProductID = @ProductID;
IF @@ROWCOUNT = 0
BEGIN
INSERT INTO SalesByProduct
VALUES (@ProductID, 1);
END;
END;
With SQL Server 2008 onwards, consider using the MERGE statement for upserts, execution time = 00:00:20.904
CREATE PROCEDURE uspUpsert_V3 (@ProductID int) AS
BEGIN
SET NOCOUNT ON;
MERGE SalesByProduct AS target
USING (SELECT @ProductID)
AS
source (ProductID)
ON (target.ProductID = source.ProductID)
WHEN MATCHED THEN
UPDATE
SET Sold += 1
WHEN NOT MATCHED THEN
INSERT (ProductID, Sold)
VALUES (source.ProductID, 1);
END;
Understand and use the full power of T-SQL.
Most people know how to UNION results sets together, but do not know about INTERSECT and EXCEPT.
Also a lot of development effort can be saved by using T-SQL’s analytics extensions where appropriate:-
RANK()
DENSE_RANK()
NTILE()
ROW_NUMBER()
LEAD() and LAG() (introduced in Denali)
Scalar functions are another example of RBAR, consider this function:-
CREATE FUNCTION udfMinProductQty ( @ProductID int )
RETURNS int
AS
BEGIN
RETURN ( SELECT MIN(OrderQty)
FROM Sales.SalesOrderDetail
WHERE ProductId = @ProductID )
END;
Now lets call the function from an example query:-
SELECT ProductId,
dbo.udfMinProductQty(ProductId)
FROM Production.Product
Elapsed time = 00:00:00.746
Now doing the same thing, but using an inline table valued function:-
CREATE FUNCTION tvfMinProductQty (
@ProductId INT
)
RETURNS TABLE
AS
RETURN (
SELECT MAX(s.OrderQty) AS MinOrdQty
FROM Sales.SalesOrderDetail s
WHERE s.ProductId = @ProductId
)
Invoking the inline TVF from a query:-
SELECT ProductId,
(SELECT MinOrdQty
FROM dbo.tvfMinProductQty(ProductId) ) MinOrdQty
FROM Production.Product
ORDER BY ProductId
Elapsed time 00:00:00.330
Leverage functionality already in SQL Server, never reinvent it, this will lead to:-
More robust code
Less development effort
Potentially faster code
Code with better readability
Easier to maintain code
A scenario that actually happened:-
A row is inserted into the customer table
Customer table has a primary key based on an identity column
@@IDENTITY is used to obtain the key value of the customer row inserted for the creation of an order row with a foreign key linking back to customer.
The identity value obtained is nothing like the one for the inserted row – why ?
@@IDENTITY obtains the latest identity value irrespective of the session it came from.
In the example the replication merge agent inserted a row in the customer table just before @@IDENTITY was used.
The solution: always use SCOPE_IDENTITY() instead of @@IDENTITY.
Developing applications that use database and perform well depends on good:-
Schema design
Compiled statement plan reuse.
Connection management.
Minimizing the number of network round trips between the database and the tier above.
Parameterise your queries in order to minimize compiling.
BUT, watch out for “Parameter sniffing”.
At runtime the database engine will sniff the values of the parameters a query is compiled with and create a plan accordingly.
Unfortunate when the values cause plans with table scans, when the ‘Popular’ values lead to plans with index seeks.
Use the RECOMPILE hint to force the creation of a new plan.
Use the optimise for hint in order for a plan to be created for ‘Popular’ values you specify.
Use the OPTIMISE FOR UNKNOWN hint, to cause a “General purpose” plan to be created.
Copy parameters passed into a stored procedure to local variables and use those in your query.
For OLTP style applications:-
Transactions will be short
Number of statements will be finite
SQL will only affect a few rows for each execution.
The SQL will be simple.
Plans will be skewed towards using index seeks over table scans.
Recompiles could double+ query execution time.
Therefore recompiles are undesirable for OLTP applications.
For OLAP style applications:- Complex queries that may involve aggregation and analytic
SQL. Queries may change constantly due to the use of reporting
and BI tools. May involve WHERE clauses with potentially lots of
combinations of parameters. Foregoing a recompile via OTPION(RECOMPILE) may be
worth taking a hit on for the benefit of a significant reduction in total execution time.
This is the exception to the rule.
Be careful when using table variables.
Statistics cannot be gathered on these
The optimizer will always assume they only contain one row.
This can lead to unexpected execution plans.
Table variables will always inhibit parallelism in execution plans.
This applies to conditions in WHERE clauses.
If a WHERE clause condition can use an index, this is said to be ‘Sargable’
A searchable argument
As a general rule of thumb the use of a function on a column will suppress index usage.
i.e. WHERE ufn(MyColumn1) = <somevalue>
Constructs that will always force a serial plan:-
All T-SQL user defined functions.
All CLR user defined functions with data access.
Built in function including: @@TRANCOUNT, ERROR_NUMBER() and OBJECT_ID().
Dynamic cursors.
Constructs that will always force a serial region within a plan:- Table value functions TOP Recursive queries Multi consumer spool Sequence functions System table scans “Backwards” scans Sequence functions Global scalar aggregate
Make stored procedures and functions relatively single minded in what they do.
Stored procedures and functions with lots of arguments are a “Code smell” of code that:-
Is difficult to unit test with a high degree of confidence.
Does not lend itself to code reuse.
Smacks of poor design.
An ‘Ordinal’ in the context of the ORDER BY clause is when numbers are used to represent column positions.
If the new columns are added or their order changed in the SELECT, this query will return different results, potentially breaking the application using it.
SELECT TOP 5
[SalesOrderNumber]
,[OrderDate]
,[DueDate]
,[ShipDate]
,[Status]
FROM [AdventureWorks].[Sales].[SalesOrderHeader]
ORDER BY 2 DESC
SELECT * retrieves all columns from a table
bad for performance if only a subset of these is required.
Using columns by their names explicitly leads to improved code readability.
Code is easier to maintain, as it enables the “Developer” to see in situ what columns a query is using.