IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios...

19
IR Techniques For P2P Net works 1 Information Retrieval Techniques For Peer-To- Peer Networks Demetrios Zeinalipour- Yazti, Vana Kalogeraki and Dimitrios Gunopulos Presented By Ranjan Dash

Transcript of IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios...

Page 1: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

1

Information Retrieval Techniques For Peer-To-Peer Networks

Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos

Presented By Ranjan Dash

Page 2: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

2

Layout

Introduction P2P Network IR Techniques PeerWare Infrastructure and

experiments

Page 3: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

3

Introduction Major challenge

efficiently search the content of other peers Definition

Large number of peers collaborate dynamically in an ad hoc manner and share information in large-scale distributed environments without centralized co-ordination

P2P environment characteristic Each peer has a database or collection of docs Query contains set of key words Reply message contains pointers to matching documents

Different from static data environments No central repository Nodes join and leave in ad hoc and dynamically

Page 4: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

4

P2P Network IR Techniques

P2P Network IR Techniques Breadth-First Search (BFS) Random Breadth-First-Search (RBFS) Intelligent Search Mechanism (ISM) Directed BFS and >RES Random Walker Searches Randomized Gossiping Local Routing Indices Centralized Approaches Searching Object Identifiers Distributed IR

Page 5: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

5

P2P Network IR Techniques

Breadth-First Search (BFS) Widely used in file-sharing systems Propagates to all neighbors except sender QueryHit Msg (#of docs, bandwidth info) follows the same

path Simple, guarantees high hit rate Poor in performance and network utilization Low bandwidth node - a bottleneck Can be improved using TTL

Page 6: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

6

P2P Network IR Techniques

Random Breadth-First Search (RBFS) Dramatic improvements over

BFS Forwards only to a fraction of its

peers, selected at random Does not need global knowledge,

takes local decisions - faster Probabilistic – might not reach

some large network segments

Page 7: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

7

P2P Network IR Techniques

Intelligent Search Mechanism (ISM)

Quick, efficient and least communication costs

Propagates only to peers more likely to reply

Consists of 2 components that run in each peer

Profile mechanism Relevance rank

Works good for query locality Forwards to same neighbor always -Starvation for new peers Solution – add small random subset of peers to most relevant set

Page 8: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

8

P2P Network IR Techniques

Profile mechanism Builds a profile for each of its neighboring peers Maintains T most recent Queries and QueryHits with no

of results Least recently used replacement policy for most recent

query

Page 9: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

9

P2P Network IR Techniques

Relevance rank Ranking of neighbors to decide which ones to

forward a query Ranking of a peer ‘Pi’ for a query ‘q’ Qsim is cosine similarity between 2 queries

= 0, most results in the past that matters like >RES

Page 10: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

10

P2P Network IR Techniques

Directed BFS and >RES forwards a query to a subset of

its peers based on some aggregated statistics

Send out to ‘k’ peers which had returned the most results for the last ‘m’ queries

BFS turned into a DFS for ‘k’ = 1, ‘m’=10 Similar to ISM, but simpler Does not explore nodes that contain content related to query Performs well because it routes larger networks segments

Page 11: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

11

P2P Network IR Techniques

Random-Walker SearchesEach node randomly forwards a query message, called a walker to one of its peersCan be extended from 1-walker to k-walkerResembles RBFS but message numbers increase linearlyLike RBFS does not use most relevant content to guide query

Adaptive Probability search (APS) – similarUses feed back from previous searches to probabilistically guide future walkers

Page 12: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

12

Randomized Gossiping – PlanetP Global inverted index, partially constructed by each

node, called local index bloom filter Propagates it to the rest through gossiping Adv. Of bloom filter –

Smaller messages Saving in network I/O

Problem of scalability for PlanetP

P2P Network IR Techniques

Page 13: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

13

Local Routing Indices by Arturo Crespo and Hector Garcia-Molina Hybrid technique uses local indices containing the “direction”

toward the documents 3 techniques –

compound routing indices (CRI) hop-count routing index (HRI) exponentially aggregated index (ERI)

Good for topologies where only few nodes have very large numbers of neighbors - (tree, tree with cycles)

The routing indices are similar to the routing tables deployed in the Bellman–Ford

CRI - a node q maintains statistics for each neighbor that indicate how many documents are reachable through each neighbor.

HRI - CRI for k hops – prohibitive storage cost for large k. ERI - addresses the issue of HRI by aggregating HRI using a cost

formula.

P2P Network IR Techniques

Page 14: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

14

Centralized Approaches maintain an inverted index over all the documents in the

participating hosts’ collections - Google, Yahoo, Napster Each joining peer A uploads an index of all its shared

documents to the central repository R. A querying node B searches A’s documents through R. B can communicate with A directly (using an out-of-band

protocol such as HTTP). Kazaa - Little different. Uses a set of more-powerful peers

that acts as a central repositories different kind of animal than the rest. Simple, Robust, shorter search time, guaranteed to find all results

P2P Network IR Techniques

Page 15: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

15

Searching Object Identifiers Distributed file indexing systems - Chord, OceanStore, and

Content –Addressable Network (CAN), Freenet efficient searches using object identifiers (a hashcode on

the name of a file) rather than keywords. Perform object lookup operations to get the address (an IP

address) of the node that is storing the object. Optimizes object retrieval by minimizing the numbers of

messages and hops required. Disadvantage - only search for object identifiers and thus

can’t capture the relevance of the doc.

P2P Network IR Techniques

Page 16: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

16

Distributed IR Having distributed databases, the main IR problem

is deciding which databases are most likely to contain the most relevant documents.

It’s possible to achieve good results for conceptually separated collections.

However, the assumption is that the querying party has some statistical knowledge about each database’s contents (word frequencies in documents) and therefore must have a global view of the system.

P2P Network IR Techniques

Page 17: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

17

PeerWare Infrastructure and experiments

Evaluation metrics – recall rate – the fraction of documents each of the

search mechanisms retrieves Efficiency - the number of messages needed to find

the results Implemented only algorithms that require local

knowledge when searching for documents. BFS (the baseline) Implemented RBFS, >RES (k = 0.5 * d and m = 100,

where d is the degree of a node) , and ISM these 3 techniques forward query messages to half the

neighbors that BFS contacts. >RES and ISM use previous knowledge to decide on

which peers to forward the query

Page 18: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

18

BFS requires almost 2.5 times as many messages as its competitors.

PeerWare Infrastructure and experiments

Page 19: IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

IR Techniques For P2P Networks

19

PeerWare Infrastructure and experiments

ISM found the most documents. ISM achieved almost a 90-percent recall rate while using only 38 percent of the messages BFS required. ISM improves its knowledge over time. Both >RES and ISM started out with a low recall rate (around 40 to 50 percent) because initially they randomly choose their neighbors.