Connect with life Nauzad Kapadia Quartz Systems [email protected].

19
Connect with life www.connectwithlife.co.in Integrated Full Text Search (iFTS) with SQL Server 2008 Nauzad Kapadia Quartz Systems [email protected]

Transcript of Connect with life Nauzad Kapadia Quartz Systems [email protected].

Page 1: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Connect with life

www.connectwithlife.co.in

Integrated Full Text Search (iFTS) with SQL Server 2008

Nauzad KapadiaQuartz [email protected]

Page 2: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Session Objectives And Takeaways

Session Objective(s): Discover what we have learned from using the new Integrated Full Text SearchFind out what works and what doesn’t work

iFTS is faster than SQL 2005 full textA great base for future improvements

Page 3: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

BackgroundBased on MSSearchLack of Integration limited the performance of queriesLimits the ability to integrate with high availability and scalability functionality

Page 4: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

New Architecture

Page 5: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Demo – Using Full Text Search

Page 6: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

StopLists• New STOPLIST support

Simplified noise words utilization and manageability.DB object associated with the FT index.

CREATE FULLTEXT STOPLIST stoplist_name

[ FROM {[database_name.] source_stoplist_name} | SYSTEM STOPLIST]

[AUTHORIZATION owner_name]

ALTER FULLTEXT STOPLIST stoplist_name{ | ADD <keyword> LANGUAGE language_term | DROP

{| <keyword> LANGUAGE language_term| ALL LANGUAGE language_term| ALL}

Page 7: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Demo – Stop Lists and Creating FullText Indexes

Page 8: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Thesaurus

• Thesaurus improvements

• Stored in internal tables (in tempdb) in XML form instead of being parsed from external files

• Instance level thesaurus

sys.sp_fulltext_load_thesaurus_file (lcid)

Loads all the data specified in the Thesaurus XML corresponding to the language with specified lcid.

Page 9: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Demo - Thesaurus

Page 10: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

SQL Server 2008: IFTS

• New family of Word-Breakers (WB): WBs are components responsible of parse the textual data in a given language and pass the tokenized result to the Full-Text Index.

• 51 languages/WBs out of the box

• Improved quality in many already existing word-breakers

Page 11: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

SQL Server 2008: IFTS

EnglishEnglish UKSimplified ChineseTraditional ChineseChinese (Hong Kong)Chinese (Macau)Chinese (Singapore)ThaiKoreanFrenchGermanJapaneseItalianSpanishBengaliBulgarianCatalanCroatian

NeutralPunjabiRomanianSerbian CyrillicSerbian LatinSlovakSlovenianTamilTeluguUkrainianUrdu LithuanianMalayIcelandicIndonesianHindiGujarati

VietnameseArabicNorwegianPortuguese BrazilianRussianDutchMalayalamMarathiHebrewCanadaLatvianSwedishDanishPolishTurkish

• WBs available in SQL Server 2008:

Languages present but disabled by default

New languages supported in SQL Server 2008

Existing in SQL Server 2005, and being replaced by new WBs in SQL Server 2008

Unchanged language/WB from SQL Server 2005

Page 12: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

• The indexing performance has improved in most scenarios

2005 Crawl 2005 Total IFTS

Crawl IFTS Total

20M rows 1k text data

02:06 02:25 01:22 01:28

5M rows 8k text data

02:10 02:41 02:22 02:32

20M rows 1k nvarchar data

01:37 01:55 01:20 01:26

Measured on 4 processor AMD64 2793 MHz, 8G RAM. Numbers are in HH:MM format. Total time is combining time to crawl and time of merge into index

Indexing Performance

For some HW configuration and data types, specific best practices are recommended to improve indexing performance (i.e: capping SQL Server’s memory)

Page 13: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

• To see the word frequency• Sys.dm_fts_index_keywords()• Sys.dm_fts_index_keywords_by_document()

• Get number and size of fragments• Sys.fulltext_index_fragments

• Understanding Query Behavior• Sys.dm_fts_parser(““This is test” AND “This is also a

test”, 1033,0)

Helpful Commands

Page 14: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Demo – Understanding Indexes

Page 15: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

- Due to new architecture, we have now new Full-Text Indexes. Former ones are not compatible in SQL Server 2008.

Solution: Full-Text Catalog Upgrade Option

- Import: (default) Faster method although performance and semantic implications are possible.

- Rebuild: Slower method although ideal final state of new FTCatalogs guaranteed.

- Reset: Faster Upgrade method although your Search app will not have the FTCatalogs available afterwards. You need to rebuild them when possible.

- Possible Upgrade methods:

1. In place Upgrade: User will be prompted for what Upgrade Option to choose for existing FTCatalogs.

2. Restore/Attach : Instance level setting will be applied to former Full-Text Catalogs brought up with the former DB.

Upgrading

Page 16: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

• Put full text index on separate file group to avoid fragmenting main data files

• Use varchar(max) instead of text/image

• If you see excessive blocking by FTGATHER• turn off auto change tracking • Schedule manual job to do updates• Watch number of fragments to determine how often to

schedule

• Don’t run full text master merges with other index rebuilds or reorgs at the same time

• If you have large documents (>2MB) may need to reduce SQL memory a bit so FDHost daemon has memory to run.

Best Practices

Page 17: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

Related ContentWebcasts

MSDN Webcast: Using Full-Text Search in SQL Server Expresshttp://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032294979&Culture=en-US

White Paper on SQL Server FTS 2005http://msdn2.microsoft.com/en-us/library/ms345119.aspx

MSDN http://msdn2.microsoft.com/en-us/library/ms142571.aspx

Technical Case Studyhttp://www.microsoft.com/technet/itshowcase/content/intdocmgmtsql2005.mspx

FTS 2008 (iFTS) White Paperhttp://msdn.microsoft.com/en-us/library/cc721269(SQL.100).aspx

Program Manager on SQL Server [email protected]

Page 19: Connect with life  Nauzad Kapadia Quartz Systems nauzadk@quartzsystems.com.

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.