Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853...

17
Deep Dive on new Search Features in Denali CTP1 Naveen Garg, Principal Program Manager Microsoft Corporation

Transcript of Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853...

Page 1: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Deep Dive on new Search Features in Denali CTP1

Naveen Garg, Principal Program Manager

Microsoft Corporation

Page 2: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Search Improvements

FullText Search

• Revamped Codebase for Significant Performance and Scale Improvement

• New Property-scoped Search

• Customizable NEAR Search

Page 3: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Performance & Scale Goals

• Scale up to 350M documents

• Query magnitudes faster than 2008 release

• Worst-case Query response time < 3 sec

• At par or better than key DBMS players

Page 4: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments - Versioned Updates

• SQL 2008 Design Issue

– Queries block updates to internal table that maintains state of index wrt document updates (such as for auto change tracking)

• Denali Investment

– Track batch commits in order

– Track lowest timestamp below which all batches are committed

– Select index data for query and merge from below this timestamp

Block update of the lowest timestamp only (instead of index updates)

Page 5: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments – Single STVF for Query

Goals • Improve Query Execution Performance

• Lower costs, better plans and hints

• Better code organization

Code Changes • Query Preparation

– Rewrite CONTAINS/FREETEXT in terms of CONTAINSTABLE/FREETEXTTABLE during binding

– Rewrite it as SELECT [TOP N] key, score FROM STVF [ORDER BY score] during QP prepare

• Compilation – Parse parameters to a tree and bind specific columns, Word breaking

– Tree Expansion with appropriate AND/OR, Noise word filtering

– Tree Reduction, Load Stats

• Execution – Transform to execution tree including Ranking function

– Iterate to produce resulting rows

Page 6: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments - Predicate Folding

• Multiple CONTAINS, FREETEXT Folded together

e.g. – CT(1) AND CT(2) AND CT(3) => CT(1 AND 2 AND 3)

– CT(1) AND CT(2) OR CT(3) => CT(1 AND 2 OR 3)

– CT(1) AND NOT (NOT CT(2) OR CT(3)) => CT( 1 AND 2 AND NOT 3)

– CT(1) AND (CT(2) OR i=10 OR CT(3)) => CT(1) AND (CT(2 OR 3) OR i=10)

• Except… – CT(1) AND NOT (CT(2) AND CT(3)) => CT(1) AND (NOT CT(2) OR NOT CT(3))

NO folding

Page 7: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments – Query Parallelism

Goals • Retain basic assumptions to avoid complete rewrite

• Scale to 1.6x latency reduction for doubling the cores

• Work well on both NUMA and UMA architectures

Changes • Query Optimizer and Execution updates to allow fulltext query

parallelism

• Fulltext STVF Updates to support multiple threads per query – Use DocID histogram to slice doc ranges for each thread

– Rebuild Autostats as part of background/master merge

Page 8: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Summary Of Code Improvements

• Faster Execution – Numerous code and data layout improvements

– No blocking during high index update workloads

– Improved mixed relational query processing

– Optimize Top N by Rank

• 10x: Select top 1K by score for keyword in 1M docs (250ms -> 28ms)

• Leverage CPU – Cache for Operators and Core Algorithms

• Batch decompression and rank computation, virtual functions

– Vector CPU instructions (SSE*) for scalar computations

• Ranking, TOP N, and Stale Test as major benefiters

• Leverage multicore – Parallel Query execution

– Parallel Master Merge

* SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions

Page 9: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Query Throughput on 350M Documents

0 3.014

10.501

43.775

119.017

0.597 0.534 0.385 0.853 1.364

0

20

40

60

80

100

120

140

0 5 10 15 20 25 30 35

Thro

ugh

pu

t (q

ps)

CPU

Throughput (qps) with DML

SQL Server Denali

SQL Server 2005

0 3.009

13.571

64.825

157.93

4.772 8.147

17.102

48.27

61.374

0

20

40

60

80

100

120

140

160

180

0 5 10 15 20 25 30 35

Th

rou

ghp

ut

(qp

s)

CPU

Throughput (qps) without DML

SQL Server Denali

SQL Server 2005

Page 10: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Throughput & Execution Time on a Customer Workload

0

20

40

60

80

100

120

140

0 500 1000 1500 2000 2500

Qu

eri

es/

Seco

nd

Number of Connections

Throughput/#Connections

SQL Server Denali

SQL Server 2005

0

10000

20000

30000

40000

50000

60000

70000

0 500 1000 1500 2000 2500

Avg

Exe

cTim

e(m

s)

Number of Connections

AvgExecTime (ms)/#connections

SQL Server Denali

SQL Server 2005

Page 11: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Query Throughput on another Customer Workload

2X Query performance improvement compared with SQL Server 2005

0

1

2

3

4

5

6

7

8

9

0 50 100 150 200 250 300 350 400 450

Qu

erie

s /

Seco

nd

s

Users

Scaling Queries/Seconds

SQL Server Denali

SQL Server 2005

Page 12: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Performance & Scale Summary

• Index and Query tested on scale up to 350Million documents with < ~2 Sec Response – ~3X better w/o DML and ~9X better w DML throughput

– Scale easily with increasing number of connections

• TAP customers already reporting significant performance improvement on their workloads

Page 13: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Property Scoped Search

• Load Office Filters (needed once per database instance) –EXEC sp_fulltext_service 'load_os_resources',1; –EXEC sp_fulltext_service 'restart_all_fdhosts„;

• Create a property list –CREATE SEARCH PROPERTY LIST p1;

• Add properties to be extracted –ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH – (PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9', – PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');

• Create/Alter Fulltext index to specify property list to be extracted –ALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];

• Query for properties –SELECT * FROM fttable WHERE – CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');

Page 14: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Identifying Property GUIDs • Commonly known Property Guids documented in MSDN

• For the rest… – Enable TF 7603

– Create and fully populate a Fulltext index with property search

– Check error log for Property Guids

– Recreate Index with required properties

• OR use FiltDump.EXE (Windows SDK) – Get property details

Attribute = {F29F85E0-4FF9-1068-AB91-08002B27B3D9}\2 (System.Title)

Page 15: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Indexing Properties with Keywords • Stored along with keywords but with additional

Internal Property ID (s)

Page 16: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Customizable ‘NEAR’ operator

• NEAR (( { <simple_term> | <phrase> | <prefix_term> } [,…n] ), [<maximum_distance> [, <match_order> ]) <maximum_distance> ::= { integer | MAX } <match_order> ::= { TRUE | FALSE }

• E.G. • Resumes in the human resources DB containing the term “SQL

Server” within no more than 5 words from “expertise”:

• SELECT candidate_name FROM Candidates • WHERE CONTAINS(Resume, „NEAR((“SQL Server”, expertise),5,

FALSE)‟);

Customize Maximum Gap between terms/phrases when using NEAR operator

Page 17: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Customizable NEAR

• Search for documents with two words a distance apart

Old NEAR Usage SELECT * FROM fttable WHERE CONTAINS(*, 'test NEAR Space')

New NEAR Usages • Specify Distance SELECT * FROM fttable WHERE CONTAINS(*, „NEAR((test, Space), 5,FALSE)')

• Reduce Distance SELECT * FROM fttable WHERE CONTAINS(*, „NEAR((test, Space), 2,FALSE)')

• Mandate Order of words SELECT * FROM fttable WHERE CONTAINS(*, „NEAR((test, Space), 5,TRUE)')