SQL Server - Full text search
-
Upload
peter-gfader -
Category
Education
-
view
1.061 -
download
1
description
Transcript of SQL Server - Full text search
SQL Server 2008 for DevelopersUTS Short Course
Specializes in
C# and .NET (Java not anymore)
TestingAutomated tests
Agile, ScrumCertified Scrum Trainer
Technology aficionado • Silverlight• ASP.NET• Windows Forms
Peter Gfader
Course Timetable & Materials
http://www.ssw.com.au/ssw/Events/2010UTSSQL/
Resources
http://sharepoint.ssw.com.au/Training/UTSSQL/
Course Website
Course OverviewSession
Date Time Topic
1Tuesday03-08-2010
18:00 - 21:00
SQL Server 2008 Management Studio
2Tuesday10-08-2010
18:00 - 21:00
T-SQL Enhancements
3Tuesday17-08-2010
18:00 - 21:00
High Availability
4Tuesday24-08-2010
18:00 - 21:00
CLR Integration
5 Tuesday31-08-2010
18:00 - 21:00 Full-Text Search
.NET
.NET FX
CLR
What we did last weekCLR Integration
Stored Proc
Functions
Triggers
Bottom Line Use T-SQL for all data operations Use CLR assemblies for any complex calculations
and transformations
What we did last weekCLR Integration
Find all products that have a productnumber starting with BK
Find all products with "Road" in the name that are Silver
Find a list of products that have no review Find the list price ([listprice]) of all products in our shop What is the sum of the list price of all our products Find the product with the maximum and minimum
listprice Find a list of products with their discount sale (hint see
Sales.SalesOrderDetail) Find the sum of prices of the products in each
subcategory
Homework?
Session 5SQL Server Full-Text Searchusing Full-Text search in SQL Server 2008
What is Full text search
The old way 2005
The new way 2008
How to
Querying
Agenda
SELECT *FROM [Northwind].[dbo].[Employees]WHERE Notes LIKE '%grad%‘
What is Fulltext search
Allows searching for text/words in columns
Similar words Plural of words
Based on special index
Full-text index (Full text catalog)
SELECT *FROM [Northwind].[dbo].[Employees]WHERE FREETEXT(*,'grad‘)
What is REAL Fulltext search
Theory
Full-text index
Information about words and their location in columns
Used in full text queries
Full-text catalog
Group of full text indexes (Container)
Word breaker
Tokenizes text based on language
Full-Text Search Terminology 1/3
Token
Word identified by word breaker
Stemmer
Generate inflectional forms of a word (language specific)
Filter
Extract text from files stored in a varbinary(max) or image column
Population or Crawl
Creating and maintaining a full-text index.
Full-Text Search Terminology 2/3
Stopwords/Stoplists
not relevant word to search e.g. ‘and’, ‘a’, ‘is’ and ‘the’ in English
Accent insensitivity
cafè = cafe
Full-Text Search Terminology 3/3
Fulltext search – Under the hood
The old way! SQL 2005
The new way! SQL 2008
How toAdministration
Administering Full-Text Search
Full-text administration can be separated into three main tasks:
Creating/altering/dropping full-text catalogs
Creating/altering/dropping full-text indexes
Scheduling and maintaining index population.
Administering Full-Text Search
sp_fulltext_catalog sp_help_fulltext_catalogs_cursor
sp_fulltext_column sp_help_fulltext_columns
sp_fulltext_database sp_help_fulltext_columns_cursor
sp_fulltext_service sp_help_fulltext_tables
sp_fulltext_table sp_help_fulltext_tables_cursor
sp_help_fulltext_catalogs
Index vs. Full-text index
Full-text indexes Regular SQL Server indexes
Stored in the file system, but administered through the database.Stored under the control of the database in which they are defined
Stored under the control of the database in which they are defined
Only 1 full-text index allowed per table
Several regular indexes allowed per table
Addition of data to full-text indexes, called population, can be requested through either a schedule or a specific request, or can occur automatically with the addition of new data
Updated automatically when the data upon which they are based is inserted, updated, or deleted
Automatic update of index
Slows down database performance
Manually repopulate full text index
Time consuming
Asynchronous process in the background
Periods of low activity Index not up to date
Administering Full-Text Search
How toCreating a Full Text Catalog
SQL 2005 Only
SQL 2008 is smart
Click icon to add chart
SQL 2005
Creating a Full-Text Catalog (SQL 2005)
Syntax
CREATE FULLTEXT CATALOG catalog_name
[ON FILEGROUP filegroup ] [IN PATH 'rootpath']
[WITH <catalog_option>]
[AS DEFAULT]
[AUTHORIZATION owner_name ] <catalog_option>::= ACCENT_SENSITIVITY = {ON|OFF}
Example
USE AdventureWorks_FulllText
CREATE FULLTEXT CATALOG AdventureWorks_FullTextCatalog
ON FILEGROUP FullTextCatalog_FG WITH ACCENT_SENSITIVITY = ON AS DEFAULTAUTHORIZATION dbo
Creating a Full-Text CatalogStep by step
1. Create a directory on the operating system named C:\test
2. Launch SSMS, connect to your instance, and open a new query window
3. Add a new filegroup to the AdventureWorks_FulllText
USE MasterGOALTER DATABASE AdventureWorks_FulllText GOALTER DATABASE AdventureWorks_FulllText ADD FILE (NAME = N’
AdventureWorks_FulllText _data’, FILENAME=N’C:\TEST\ AdventureWorks_FulllText _data.ndf’, SIZE=2048KB, FILEGROTH=1024KB ) TO FILEGROUP [FTFG1]
GO
4. Create a full-text catalog on the FTFG1 filegroup by executing the following command:USE AdventureWorks_FulllText GOCREATE FULLTEXT CATALOG AWCatalog on FILEGROUP FTFG1 IN PATH ‘C:\TEST’ AS DEFAULT;GO
Click icon to add chart
SQL 2008
Click icon to add chart
SQL 2008
How toCreating Full Text Indexes
Property of column
Full-text Index property window
How to Index and Catalog Population
Because of the external structure for storing full-text indexes, changes to underlying data columns are not immediately reflected in the full-text index. Instead, a background process enlists the word breakers, filters and noise word filters to build the tokens for each column, which are then merged back into the main index either automatically or manually. This update process is called population or a crawl. To keep your full-text indexes up to date, you must periodically populate them.
Populating a Full-Text Index
You can choose from there modes for full-text population:
Full
Incremental
Update
Populating a Full-Text Index
Full
Read and process all rows Very resource-intensive
Incremental
Automatically populates the index for rows that were modified since the last population
Requires timestamp column
Update
Uses changes tracking from SQL Server (inserts, updates, and deletes) Specify how you want to propagate the changes to the index
• AUTO automatic processing• MANUAL implement a manual method for processing changes
Populating a Full-Text Index
Example
ALTER FULLTEXT INDEX ON Production.ProductDescription START FULL POPULATION;
ALTER FULLTEXT INDEX ON Production.Document START FULL POPULATION;
Populating a Full-Text Index
Syntax
ALTER FULLTEXT CATALOG catalog_name { REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ] | REORGANIZE | AS DEFAULT }
REBUILD deletes and rebuild
ACCENT_SENSITIVITY change
REORGANIZE merges all changes
Performance Frees up disk and memory
Populating a Full-Text Catalog
Example
USE AdventureWorks_FulllText;
ALTER FULLTEXT CATALOG AdventureWorks_FullTextCatalog REBUILD WITH ACCENT_SENSITIVITY=OFF;
-- Check Accentsensitivity
SELECT FULLTEXTCATALOGPROPERTY('AdventureWorks_FullTextCatalog', 'accentsensitivity');
Populating a Full-Text Catalog
Managing Population Schedules
In SQL 2000, full text catalogs could only be populated on specified schedules
SQL 2005/2008 can track database changes and keep the catalog up to date, with a minor performance hit
Full-Text query keywords
FREETEXT
FREETEXTTABLE
CONTAINS
CONTAINSTABLE
Querying SQL Server Using Full-Text SearchHow toQuerying SQL Server Using Full-Text Search
FREETEXT
Fuzzy search (less precise )
Inflectional forms (Stemming) Related words (Thesaurus)
FREETEXT
Fuzzy search (less precise )
Inflectional forms (Stemming) Related words (Thesaurus)
SELECT ProductDescriptionID, Description
FROM Production.ProductDescription
WHERE [Description] LIKE N'%bike%';
SELECT ProductDescriptionID, Description
FROM Production.ProductDescription
WHERE FREETEXT(Description, N’bike’);
FREETEXTTABLE
+ rank column Value between 1 and 1,000 Relative number, how well the row matches the search criteria
SELECT
PD.ProductDescriptionID,
PD.Description,
KEYTBL.[KEY],
KEYTBL.RANK
FROM
Production.ProductDescription AS PD
INNER JOIN FREETEXTTABLE(Production.ProductDescription,
Description, N’bike’)
AS KEYTBL ON PD.ProductDescriptionID = KEYTBL.[KEY]
CONTAINS
• Lets you precise what fuzzy matching algorithm to use
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N'bike');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N‘”bike*”'):
INFLECTIONAL Consider word stems in search“ride“ “riding", “riden", ..
THESAURUSReturn Synonyms"metal“ "gold", "aluminium"," steel", ..
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N' FORMSOF (INFLECTIONAL, ride) ');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N' FORMSOF (THESAURUS, ride) ');
Word proximity NEAR ( ~ ) How near words are in the text/document
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N'mountain NEAR bike');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, N'mountain ~ bike');
SELECT ProductDescriptionID, Description FROM
Production.ProductDescription
WHERE CONTAINS(Description, 'ISABOUT (mountain weight(.8), bikes
weight (.2) )');
Querying SQL Server Using Full-Text Search
Full-text search much more powerful than LIKE More specific, relevant results Better performance
• LIKE for small amounts of text • Full-text search scales to huge documents
Provides ranking of results
Common uses Search through the content in a text-intensive,
database driven website, e.g. a knowledge base Search the contents of documents stored in BLOB
fields Perform advanced searches
• e.g. with exact phrases - "to be or not to be" (however needs care!)
• e.g. Boolean operators - AND, OR, NOT, NEAR
The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function
Several different types of terms:
Simple terms Prefix terms Generation terms Proximity terms Weighted terms
Writing FTS terms
Either words or phrases
Quotes are optional, but recommended
Matches columns which contain the exact words or phrases specified
Case insensitive
Punctuation is ignored
e.g.
CONTAINS(Column, 'SQL') CONTAINS(Column, ' "SQL" ') CONTAINS(Column, 'Microsoft SQL Server') CONTAINS(Column, ' "Microsoft SQL Server" ')
Simple terms
Matches words beginning with the specified text
e.g.
CONTAINS(Column, ' "local*" ')• matches local, locally, locality
CONTAINS(Column, ' "local wine*" ')• matches "local winery", "locally wined"
Prefix terms
Inflectional
FORMSOF(INFLECTIONAL, "expression") "drive“ "drove", "driven", .. (share the same stem) When vague words such as "best" are used, doesn't match the
exact word, only "good"
Thesaurus
FORMSOF(THESAURUS, "expression") "metal“ "gold", "aluminium"," steel", ..
Both return variants of the specified word, but variants are determined differently
Generation terms
Supposed to match synonyms of search terms – but the thesaurus seems to be very limited
Does not match plurals
Not particularly useful
http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506231
Thesaurus
Syntax
CONTAINS(Column, 'local NEAR winery')
CONTAINS(Column, ' "local" NEAR "winery" ')
Important for ranking
Both words must be in the column, like AND
Terms on either side of NEAR must be either simple or proximity terms
Proximity terms
Each word can be given a rank
Can be combined with simple, prefix, generation and proximity terms
e.g.
CONTAINS(Column, 'ISABOUT(performance weight(.8),comfortable weight(.4)
)') CONTAINS(Column, 'ISABOUT(
FORMSOF(INFLECTIONAL, "performance") weight (.8),FORMSOF(INFLECTIONAL, "comfortable") weight (.4)
)')
Weighted terms
ProContraPros?Cons?
Full text catalogs
Disk space Up-to-date Continuous updating performance hit
Queries
Complicated to generate Generated as a string Generated on the client
Disadvantages
Backing up full text catalogs
SQL 2005
Included in SQL backups by default Retained on detach and re-attach Option in detach dialog to include keep the full text
catalog
In SQL2008 you don’t have to worry about this
Advantages
Much more powerful than LIKE
Specific Ranking Performance
Pre-computed ranking (FREETEXTTABLE)
Configurable Population Schedule
Continuously track changes, or index when the CPU is idle
Advantages
Pluralcast - SQL Server Under the Covers
http://shrinkster.com/1ff4
Dotnetrocks - Search for SQL Server
http://www.dotnetrocks.com/archives.aspx
RunAsRadio - Search for SQL Server
http://www.runasradio.com/archives.aspx
Quick tips - Podcasts
Full text search
Download from Course Materials Site (to copy/paste scripts) or type manually:
http://sharepoint.ssw.com.au/Training/UTSSQL/
Session 5 Lab
Thank You!
Gateway Court Suite 10 81 - 91 Military Road Neutral Bay, Sydney NSW 2089 AUSTRALIA
ABN: 21 069 371 900
Phone: + 61 2 9953 3000 Fax: + 61 2 9953 3105