IR The power of “failing”. TTT 2 Not perfectly true but...

15
IR The power of “failing
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of IR The power of “failing”. TTT 2 Not perfectly true but...

Page 1: IR The power of “failing”. TTT 2 Not perfectly true but...

IR

The power of “failing”

Page 2: IR The power of “failing”. TTT 2 Not perfectly true but...
Page 3: IR The power of “failing”. TTT 2 Not perfectly true but...
Page 4: IR The power of “failing”. TTT 2 Not perfectly true but...
Page 5: IR The power of “failing”. TTT 2 Not perfectly true but...
Page 6: IR The power of “failing”. TTT 2 Not perfectly true but...
Page 7: IR The power of “failing”. TTT 2 Not perfectly true but...

TTT 2

Page 8: IR The power of “failing”. TTT 2 Not perfectly true but...

Not perfectly true but...

Page 9: IR The power of “failing”. TTT 2 Not perfectly true but...

0

0,01

0,02

0,03

0,04

0,05

0,06

0,07

0,08

0,09

0,1

0 1 2 3 4 5 6 7 8 9 10

Fa

lse

po

siti

ve

rate

Hash functions

m/n = 8Opt k = 5.45...

We do have an

explicit formula

for the optimal k

Page 10: IR The power of “failing”. TTT 2 Not perfectly true but...
Page 11: IR The power of “failing”. TTT 2 Not perfectly true but...
Page 12: IR The power of “failing”. TTT 2 Not perfectly true but...

Other advantage: no key storage

Page 13: IR The power of “failing”. TTT 2 Not perfectly true but...

Crawling

What data structures should we use to keep

track of the visited URLs of a crawler?

URLs are long

Check should be very fast

No care about small errors (≈ page not crawled)

Bloom Filter

over crawled URLs

Page 14: IR The power of “failing”. TTT 2 Not perfectly true but...

Anti-virus detection

D is a dictionary of virus-checksum of some given length z. For each position i, check…

Brute-force check: O( |D| * |F| ) time Trie check: O( z * |F| ) time Better Solution ?

Build a BF on D.

Check T[i,i+z-1] є D, if BF answers YES

then “warn the user” or explicitly scan D

FVji i+z

O(k*|F|)

or even better...

Page 15: IR The power of “failing”. TTT 2 Not perfectly true but...