IR Presentation
-
Upload
bushra-al-zaareer -
Category
Documents
-
view
17 -
download
1
description
Transcript of IR Presentation
![Page 1: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/1.jpg)
By Bushra Al-Za’areer
introducing
Signature File – Suffix Tree & Suffix Array
![Page 2: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/2.jpg)
Chapter 9 Indexing & Searching
introducingSignature File – Suffix Tree & Suffix Array
1Signature File
2Suffi x Tree
3Suffi x Array
![Page 3: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/3.jpg)
Signature File
Signature File – Suffix Tree & Suffix Array
1
![Page 4: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/4.jpg)
Signature File chapter 9
• Consider:• H(information) = 010001• H(text) = 010010• H(data) = 110000• H(retrieval) = 100010
• The block signatures of a document D containing the text“textual retrieval and information retrieval” (after removingStop words and stemming) for a block size of two terms –would be:oB1D = 110010 andoB2D = 110011
![Page 5: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/5.jpg)
Signature File chapter 9
To search for a given term we compare whether the term’s bit string could be “inside” the block signatures:• Consider we are searching for “text” in document Do H(text) = 010010 and B1D = 110010o H(text) bit-wise-AND B1D = 010010 = H(text)o Therefore “text” could be in B1D (it is in this particularocase)
• Consider we are now searching for “data”o H(data) bit-wise-AND B1D = 110000 = H(data)o H(data) bit-wise-AND B2D = 110000 = H(data)o Though “data” is not in either block !
• Signature files may yield false hits …
![Page 6: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/6.jpg)
Signature File chapter 9
How to keep the probability of a false alarms low ?How to predict how good a signature is ?
o False drop occurs a document signature matches a query’s signature but the query’s word doesn’t match any word on document.
• The rate of false drop depends on:o The size of the signature.o The number of word per-block.
![Page 7: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/7.jpg)
Signature File chapter 9
• Inverted or Signature? Inverted Files:
1. Slower retrieval2. More accurate 3. Easier to maintain
• In fact, inverted files are still the most popular storage for information retrieval.
![Page 8: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/8.jpg)
2 Suffix Tree summary
Chapter 9
![Page 9: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/9.jpg)
Signature File chapter 9
• Example:
![Page 10: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/10.jpg)
3 Suffix Array summary
Chapter 9
![Page 11: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/11.jpg)
Signature File chapter 9
• Suffix Trees and Suffix Arrays indexes see the text as one long string. Each position in the text is considered as a text suffix. Each suffix is thus uniquely identified by its position.
• Index points are selected from the text, which point to the beginning of the text positions which will be retrievable.
• This structure can be used to index words or characters.
![Page 12: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/12.jpg)
Signature File chapter 9
• This structure can be used to index words or characters.
![Page 13: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/13.jpg)
Signature File chapter 9
• Suffix arrays provide essentially the same functionality as suffix trees with much less space requirements.
• A suffix array is simply an array containing all the pointers to the text suffixes listed in lexicographical order.
• Suffix arrays are designed to allow binary searches done by comparing the contents of each pointer.
![Page 14: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/14.jpg)
Signature File chapter 9
• With suffix trees and suffix arrays we can search for– Words– Prefixes & suffixes– Phrases.
![Page 15: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/15.jpg)
? Any Question???Ask me!
Chapter 9
![Page 16: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/16.jpg)
The most popular storage for information retrieval
inverted files…
Conclusion
![Page 17: IR Presentation](https://reader033.fdocuments.in/reader033/viewer/2022051214/55cf929c550346f57b98020a/html5/thumbnails/17.jpg)
What’s Your Message?Thank You