Information Retrieval

9
(C) 2003, The University of Michigan 1 Information Retrieval Handout #4 February 10, 2003

description

Information Retrieval. February 10, 2003. Handout #4. Course Information. Instructor: Dragomir R. Radev ([email protected]) Office: 3080, West Hall Connector Phone: (734) 615-5225 Office hours: M&F 11-12 Course page: http://tangra.si.umich.edu/~radev/650/ - PowerPoint PPT Presentation

Transcript of Information Retrieval

Page 1: Information Retrieval

(C) 2003, The University of Michigan 1

Information Retrieval

Handout #4

February 10, 2003

Page 2: Information Retrieval

(C) 2003, The University of Michigan 2

Course Information

• Instructor: Dragomir R. Radev ([email protected])

• Office: 3080, West Hall Connector

• Phone: (734) 615-5225

• Office hours: M&F 11-12

• Course page: http://tangra.si.umich.edu/~radev/650/

• Class meets on Mondays, 1-4 PM in 409 West Hall

Page 3: Information Retrieval

(C) 2003, The University of Michigan 3

Indexing and searching(cont’d)

Page 4: Information Retrieval

(C) 2003, The University of Michigan 4

Suffix trees1234567890123456789012345678901234567890123456789012345678901234567This is a text. A text has many words. Words are made from letters.

60

28

50

11

19

33

40

l

m ad

n

te x t

.

‘ ‘

w

o r d s‘ ‘

.

Patricia tree

Page 5: Information Retrieval

(C) 2003, The University of Michigan 5

Sequential string searching

• Boyer-Moore algorithm

• Example: search for “cats” in “the catalog of all cats”

• Some preprocessing is needed.• Demos:

http://www.blarg.com/~doyle/pages/bmi.htmlhttp://www-sr.informatik.uni-tuebingen.de/~buehler/BM/BM.html

Page 6: Information Retrieval

(C) 2003, The University of Michigan 6

Latent Semantic Indexing

Page 7: Information Retrieval

(C) 2003, The University of Michigan 7

LSI

• Dimensionality reduction

010

110

000

0

0

1

0

1

1000

001

10

01cos54321

truck

car

moonastronaut

monautddddd

A

Page 8: Information Retrieval

(C) 2003, The University of Michigan 8

Text tiling

Page 9: Information Retrieval

(C) 2003, The University of Michigan 9

Text tiling

• Change in cohesion = topic boundary

cohesion

Example from Manning and Schuetze