Information Retrieval
-
Upload
mark-gallagher -
Category
Documents
-
view
27 -
download
3
description
Transcript of Information Retrieval
(C) 2003, The University of Michigan 1
Information Retrieval
Handout #4
February 10, 2003
(C) 2003, The University of Michigan 2
Course Information
• Instructor: Dragomir R. Radev ([email protected])
• Office: 3080, West Hall Connector
• Phone: (734) 615-5225
• Office hours: M&F 11-12
• Course page: http://tangra.si.umich.edu/~radev/650/
• Class meets on Mondays, 1-4 PM in 409 West Hall
(C) 2003, The University of Michigan 3
Indexing and searching(cont’d)
(C) 2003, The University of Michigan 4
Suffix trees1234567890123456789012345678901234567890123456789012345678901234567This is a text. A text has many words. Words are made from letters.
60
28
50
11
19
33
40
l
m ad
n
te x t
.
‘ ‘
w
o r d s‘ ‘
.
Patricia tree
(C) 2003, The University of Michigan 5
Sequential string searching
• Boyer-Moore algorithm
• Example: search for “cats” in “the catalog of all cats”
• Some preprocessing is needed.• Demos:
http://www.blarg.com/~doyle/pages/bmi.htmlhttp://www-sr.informatik.uni-tuebingen.de/~buehler/BM/BM.html
(C) 2003, The University of Michigan 6
Latent Semantic Indexing
(C) 2003, The University of Michigan 7
LSI
• Dimensionality reduction
010
110
000
0
0
1
0
1
1000
001
10
01cos54321
truck
car
moonastronaut
monautddddd
A
(C) 2003, The University of Michigan 8
Text tiling
(C) 2003, The University of Michigan 9
Text tiling
• Change in cohesion = topic boundary
cohesion
Example from Manning and Schuetze