Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer...

20
Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, Norway Christos Doulkeridis and Michalis Vazirgiannis Athens University of Economics and Business Athens, Greece

Transcript of Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer...

Page 1: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems

Kjetil NørvågNorwegian University of Science and Technology Trondheim, Norway

Christos Doulkeridis and Michalis VazirgiannisAthens University of Economics and Business

Athens, Greece

Page 2: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 2

Outline

Motivation and example application Taxonomies and taxonomy-based querying Taxonomy-based query routing Taxonomy caching: architecture and maintenance Experimental results Summary and further work

Page 3: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 3

Motivation

Mobile devices high storage capacity & wireless support

Contain multimedia documents that can be shared Possibly other data/services:

– Temperature or other environmental data

Important challenge: find the files & services! Problem:

– Dynamic contents, location, and visibility

– Limited bandwidth

Centralized indexing/search engines not applicable

P2P network & search

Page 4: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 4

Example application: MobiShare

Devices share resources by hosting web services Device connected to a CAS CASs connected P2P [More details in Valavanis et al., Web Intelligence’2003]

Cell A

Cell B

Cell C

Wireless NetworkAccess Point

Wireless NetworkAccess Point

Wireless NetworkAccess Point

CASCAS

CAS

Internet

Page 5: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 5

Outline of basic idea

1) Describe contents according to taxonomy

2) Taxonomy info cached at remote peers

3) Use cached knowledge to route queriesto appropriate peers

Why?

1) Should reduce latency

2) Increase recall with same cost

Page 6: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 6

Resource descriptionTravel

Transportation Accomodation

Air Train Boat

Packagetours

Scheduledflights

Camping

Helicopter

Hotel Motel

Food

Restaurant Grocerystore

...

Taxonomy-based resource description Also applicable for audio/video More than one taxonomy might exist in system Resource description: Taxonomy ID and set of categories

Page 7: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 7

Taxonomy-based querying

Query:

1) Request for all resources belonging to category Cj

or

2) Request for all resources belonging to category Cj and satisfying some additional property

Example properties: Text contents, metadata

Page 8: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 8

Searching in unstructured P2P networks Basic search technique: Local execution of query then

forwarding if TTL>0– Naïve flooding (all neighbors)– Normalized flooding (only K neighbors)– Random walks: only one random neighbor, but W walks initiated

Problem: Only a limited # of peers can be searched (query horizon)

Possible improvements: – Routing indices– Summary indexing (bloom filters etc)– Result caching

However: Still limited scalability and coverage

Page 9: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 9

Taxonomy caching

Basic idea: – Maintain taxonomic of remote contents in a

taxonomy cache (TCache) Mapping from taxonomic concept to set of peers Advantages:

– Cheaper to maintain than full-text index– More applicable to multimedia data– More robust wrt. changes in contents

Used to improve query routing Higher recall and reduced latency

Page 10: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 10

Query routing using taxonomy cache (TCache)

1) Basis: one of traditional routing strategies

2) Query forward peers: PF

3) Starting point: PF = neighbors=PN={PN1,…,PNn}

4) Lookup in TCache: Lookup(category) PC={PC1,…,PCm}

5) PF = PN+PC

6) Query forwarded to (subset of) PF

Page 11: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 11

Query forwarding alternatives (1) Query forward peers: PF # of neighbors (excl. previous): Nn # matches from lookup: Nc Ranking of peers in PC:

– Based on # of resources within a category– High # of resources: considered experts

TCB: – Highest ranked in PC + the Nn neighbors in {PN1,…,PNn}– Forwarding to peer in PC called jump– Jump can be to peer beyond query horizon!

TCA: – If Nc ≥ Nn: forward to Nn highest ranked peers in PC

– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) randomly selected neighbors

Page 12: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 12

Query forwarding alternatives (2)

TCCN:– If Nc ≥ Nn: forward to all Nc peers in PC

– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc)

neighbors TCDN:

– If Nc ≥ Nn: forward to Nn/2 highest ranked peers in PC +

random selection of Nn/2 other peers in PC

– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc)

neighbors

Page 13: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 13

Distributing taxonomic information

Basic mechanism: piggyback matching category with query result– Rsult returned through original path, possibly

involving jumps

– Makes revalidation of contents intermediate TCaches possible

– Coverage will be gradually extended (beyond query horizon)

Lazy distribution by gossiping also possible

Page 14: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 14

TCache architecture and maintenance

Aim: Provide efficient mapping C {PC1,…,PCm} For each category: Peers, # of resources, and TTL TTL:

– Regularly decremented

– Reset to start value at revalidation

Caching policy: Aggressive vs. selective Compacting techniques: Peer upgrade & non-expert pruning

Transportation{(P,#,T)}

Air {(P,#,T)} Train{(P,#,T)}

Boat{(P,#,T)}

Packagetours

{(P,#,T)}

Scheduledflights{(P,#,T)}

Helicopter{(P,#,T)}

Page 15: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 15

Experimental setup

Simulations Excerpts of DMOZ taxonomy Synthetic network topologies Resource allocation: 80/20 rule Queries are taxonomic categories A number of peers have role as querying peers Measured: Contacted peers, messages, recall

and latency In this presentation: Results using flooding and

TCDN query routing

Page 16: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 16

Improvements in recall

NM

(F)NM

(TC)Recall (F)

Recall (TC)

TTL=1 7.8 7.0 0.0022 0.0019

TTL=3 166.7 166.0 0.0117 0.0149

TTL=5 524.7 523.9 0.0282 0.0717

TTL=7 1058.6 1057.7 0.0506 0.1835

TTL=9 1721.0 1719.6 0.0773 0.2930

TTL=11 2566.3 2566.0 0.1104 0.4012

TTL=13 3536.5 3535.8 0.1477 0.4891

TTL=15 4560.2 4558.7 0.1864 0.5755

Page 17: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 17

Primary reason for improvement:More intelligent query forwarding

NC

(F)NC

(TC)Recall (F)

Recall (TC)

TTL=1 7.8 6.7 0.0022 0.0019

TTL=3 45.3 53.4 0.0117 0.0149

TTL=5 110.6 158.0 0.0282 0.0717

TTL=7 199.9 346.8 0.0506 0.1835

TTL=9 305.6 583.1 0.0773 0.2930

TTL=11 437.7 840.3 0.1104 0.4012

TTL=13 586.7 1120.6 0.1477 0.4891

TTL=15 741.6 1372.4 0.1864 0.5755

Page 18: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 18

0

50

100

150

200

250

300

350

400

5 10 15 20

TTL

%Im

pro

vem

ento

v

N1000

N2000

N3000

Improvement and scalability

Page 19: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 19

Latency reduction

TCache results in very fast retrieval of first results

Finding all results approximately similar performance because flooding in both techniques

Page 20: Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

June 28, 2006 ICPS'2006 20

Summary and further work

Presented motivation and context Taxonomy-based querying and query routing TCache architecture and maintenance Experimental results proving our claims Future/ongoing work:

– Employing the techniques for XML/XPath querying in P2P context (to appear at IEEE P2P’2006)

– Integration of different taxonomies