Blazing Fast Queries - When Indexes Are Not Enough
-
Upload
davide-mauri -
Category
Technology
-
view
351 -
download
0
description
Transcript of Blazing Fast Queries - When Indexes Are Not Enough
April 21 – 23. Neuss, Germany
w w w . s q l p a s s . o r g / e u 2 0 1 0
Blazing Fast Queries
When Indexes Are Not Enough
Davide Mauri, Solid Quality Mentors,
Blazing Fast Queries
About Me
MCDBA, MCAD, MCT
Microsoft SQL Server MVP
Works with SQL Server from 6.5
Specialized in BI, Database Tuning
President of UGISS (Italian SQL Server UG)
Mentor @ Solid Quality MentorsItalian Subsidiary
Blazing Fast Queries
A developer story
I’ve started as a developer, I hated SQL….
…at some point my applications started to show performance and scalability problems...
(more successful they was more problems I had)
…I started to optimized the application finding clever ideas to gain performances
…the application became more and more complex to maintain
(and thus much less cost effective and profitable)
Blazing Fast Queries
…and the story ends
At some point I decided that maybe I should check if something can be done also on the database….
…I started learning database and SQL….
…and I discovered that using correct indexes and writing correct queries I could gain more performance and scalability than what I can ever dream of!
So, now everyone told you about indexes….I’ll tell you about queries and the fabulous secret technique of….
Set Based Thinking
Blazing Fast Queries
Set Based Thinking
Thinking in Sets allows you to leverage all the power of database engine
Optimizer is clever then what you can imagine!
Thinking in Sets means that you have to look at the properties that a set has and that you can use to solve your problem
Thinking in Sets means that we can find the solution from a purely logical perspective and then translate it into SQL commands
After all SQL is Logic (cit. Itzik Ben-Gan)
Blazing Fast Queries
Set Based Thinking
Thinking in Sets means that you can use a declarative approach instead of a procedural one
Ask to the database what you want, not how to get it
Let the optimizer do his work!
We have a problem:
Row-by-Rows solutions are usually what comes to our mind as a first solution
Since we’re used to think at how we can solve the problem with the programming language we will use to implement it
Blazing Fast Queries
Set Based Thinking
So, we start to implement this first-on-mind solution immeditaly...
...and then fix all the problems that arises from it
It does not scale
It does not offer performance
Solution:
Stop doing this brute-force-approach and try to find the best algorithm (independent from programming language) that can solve the problem
Unlearn to think row-by-row!
Blazing Fast Queries
Set Based Thinking
At the first time it’s difficult…
…but everything is difficult at the first time!This not:This is easy: But it should be
quite clear who will
win on a competition
All we need is
Exercising!
Blazing Fast Queries
Let’s start exercising
I’ll propose some typical real world scenario
All problems derives from situation I faced during consultancy
Then we’ll see how we can solve them using a purely set-based approach
And then we’ll compare some of them to the “easier” row-by-row approach
Blazing Fast Queries
What is NOT a set-based approach?
• Using cursors to iterate over all rows, of course
• Calling a scalar UDF that reads rows from a table, for each row in a SELECT statement
• Calling a rowset UDF using CROSS/OUTER APPLY
(except for INLINE UDFs)
• Using a SELECT to define the value of a column for each row in a SELECT statement
• WHILE Loops
Blazing Fast Queries
Let’s start! 1st problem: Loans
Knowing the value of the requested loan, interests and number for rates, generate a row for each rate, from the beginning up until the end of the loan
Loan Value # Rates Payment Frequency % Interest € 10.000,00 12 1 5,00% € 20.000,00 12 6 6,00% € 30.000,00 12 6 5,50%
Rate Date € 875,00 January-10 € 875,00 February-10 € 875,00 March-10 € 875,00 April-10 € 875,00 May-10 € 875,00 June-10 € 875,00 July-10 € 875,00 August-10 € 875,00 September-10 € 875,00 October-10 € 875,00 November-10 € 875,00 December-10
Blazing Fast Queries
Loans
Demo
Blazing Fast Queries
Groups and Correlated Values
Given a list of machines statuses, tell the latest reported status of each machine
ControllerABC
Controller Status TimeStampA StandBy 1001B StandBy 1001C StandBy 1003D Working 1004A Working 1004B StandBy 1006D StandBy 1009D Working 1010D Working 1011
Blazing Fast Queries
Groups and Correlated Values
Demo
Blazing Fast Queries
Stocks
Given a table with stock transactions get the High, Low, Open, Close & Volume values for each stock, each hour, each daySymbolID TransactionDateTime Price Volume
1 2008-01-01 09:15:21.000 75.800 25891 2008-01-01 09:25:44.000 68.200 43861 2008-01-01 09:29:31.000 74.300 28371 2008-01-01 09:34:42.000 68.900 29371 2008-01-01 09:39:13.000 72.300 45131 2008-01-01 09:43:35.000 67.300 8381 2008-01-01 09:51:57.000 73.800 13801 2008-01-01 09:56:42.000 68.700 4190
tran_hour from_datetime to_datetime symbol_id high low volume open close
92008-01-01 09:15:21.000
2008-01-01 09:56:42.000 1 75.800 67.300 23670 75.800 68.700
Blazing Fast Queries
Stocks
Demo
Blazing Fast Queries
Performance Comparison
10000 100000 10000000
1000000
2000000
3000000
4000000
5000000
6000000
7000000
47 426 754740790510442
5980523
I/O Reads
Reads Set-BasedReads Row-By-Row
I/Os
Rows
Blazing Fast Queries
Performance Comparison
10000 100000 10000000
5000
10000
15000
20000
25000
30000
35000
40000
134 970
12534
4043034
33355
Time (msec)
Time (msec) Set-BasedTime (msec) Row-By-Row
Time
Rows
Blazing Fast Queries
Free Seats / Free Warehouse Spaces
Give a group of person of defined size, find in which row in a cinema (or theater, or stadium, or whatever ), there is enough free space to make the seat near each other
Blazing Fast Queries
Free Seats / Free Warehouse Spaces
Blazing Fast Queries
Free Seats – Row by Row
3 4 5 6 7 9A. Set the “free seat counter” (FSC) to 1B. Start from the first free seat (n), store this number as
First Free Seat (FFS)C. If next free seat is n+1 then increase the FSCD. If next free seat is not n+1, store it somehere along with FFSE. restart from point A
1 2 5 1
Blazing Fast Queries
Free Seats
Demo
Row By Row Solution
Blazing Fast Queries
Free Seats – Set Based
A. If SQL Server could see the picture above would be able to see thatthere are groups of free spaces
Then we could simply use “GROUP BY”B. We need turn this “for-eyes-only” this property into something
SQL Server can handle
Blazing Fast Queries
Free Seats – Set Based
A. Let’s enumerate all the seats (free and occupied)B. And then enumerate all the free seatsC. Do a simple difference….D. And now we can use a GROUP BY
1 2 3 4 5 6 7 8 9 10 …
1 2 3 4 5 6 7 …2 2 2 2 2 3 3
Blazing Fast Queries
Free Seats
Demo
Set Based Solution
Blazing Fast Queries
Performance Comparison
30 300 30000
5000
10000
15000
20000
25000
30000
35000
40000
20 36 176381
3630
36098
I/O
Set BasedCursor
Blazing Fast Queries
Performance Comparison
30 300 30000
500
1000
1500
2000
2500
3000
3500
4000
9 22 11387
469
3516
Time (ms)
Set BasedCursor
Blazing Fast Queries
Value Packing
Now, from the same machines statuses table, we need to “pack” all the ranges
Controller Status TimeStampA StandBy 1001B StandBy 1002B StandBy 1003B StandBy 1004A Working 1005A Working 1006A Working 1007B Working 1008B Working 1009A StandBy 1010A StandBy 1011B StandBy 1012
Controller Status From ToA StandBy 1001 1001B StandBy 1002 1004A Working 1005 1007B Working 1008 1009A StandBy 1010 1011B StandBy 1012 1012
Blazing Fast Queries
Value Packing
Demo
Blazing Fast Queries
Temporal Problems
Set Based solution comes to the rescue also in temporal problems
Packing Problems
Get the minimum and the maximum date of validity of something
Auditing or Temporal Denormalization Problems
Given tables that hold data with different valid intervals, create a result that holds all the valid combination in time
Blazing Fast Queries
Temporal – Auditing/Denormalization
Give tables that contains validity period for attributes that changes of over time, generate a result that shows all the combinations that existed at some point in time
Value From To
Phone 1 2007-12-19 2008-02-02
Phone 2 2008-02-03 now
Value From To
Addresses 1 2007-12-01 2007-12-31
Addresses 2 2008-01-01 now
Phone Number Changes Address Changes
2007-12-01 2008-01-01
2007-12-19 2008-02-03
Timeline
Blazing Fast Queries
Temporal Auditing / Denormalization
Demo
Blazing Fast Queries
Conclusions
Set Based Thinking is hard….
…but deserves our attention!
And after all row-by-row solution is not that easy!
We can spend more time in thinking instead of writing ugly code
We’ve been able to optimize our query making them up to 30 times faster!
Stop crawling, and start to run!
Big customers will be very satisfied with us!
Blazing Fast Queries
Is all Gold what shines?
YES!!!
…in Theory
In 99.9% a set based solution is preferable
But in some very rare (one?) case is still not the best
The problem is that the ACTUAL implementation of T- SQL in not complete and doesn’t allow to easily get “previous” or “next” values
Vote on Connect!!!
Blazing Fast Queries
Additional Resources
Itzik Ben-Gan & Friends books:• SQL Server 2008 Fundamentals• SQL Server 2008 Querying• SQL Server 2008 Programming
Thinking in Sets• Joe Celko
SQL and Relational Theory• Chris Date
Questions
Thank you!
Don’t forget to fill out your session evaluation forms