Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer...
-
Upload
myles-mccoy -
Category
Documents
-
view
224 -
download
0
Transcript of Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer...
Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems
Kjetil NørvågNorwegian University of Science and Technology Trondheim, Norway
Christos Doulkeridis and Michalis VazirgiannisAthens University of Economics and Business
Athens, Greece
June 28, 2006 ICPS'2006 2
Outline
Motivation and example application Taxonomies and taxonomy-based querying Taxonomy-based query routing Taxonomy caching: architecture and maintenance Experimental results Summary and further work
June 28, 2006 ICPS'2006 3
Motivation
Mobile devices high storage capacity & wireless support
Contain multimedia documents that can be shared Possibly other data/services:
– Temperature or other environmental data
Important challenge: find the files & services! Problem:
– Dynamic contents, location, and visibility
– Limited bandwidth
Centralized indexing/search engines not applicable
P2P network & search
June 28, 2006 ICPS'2006 4
Example application: MobiShare
Devices share resources by hosting web services Device connected to a CAS CASs connected P2P [More details in Valavanis et al., Web Intelligence’2003]
Cell A
Cell B
Cell C
Wireless NetworkAccess Point
Wireless NetworkAccess Point
Wireless NetworkAccess Point
CASCAS
CAS
Internet
June 28, 2006 ICPS'2006 5
Outline of basic idea
1) Describe contents according to taxonomy
2) Taxonomy info cached at remote peers
3) Use cached knowledge to route queriesto appropriate peers
Why?
1) Should reduce latency
2) Increase recall with same cost
June 28, 2006 ICPS'2006 6
Resource descriptionTravel
Transportation Accomodation
Air Train Boat
Packagetours
Scheduledflights
Camping
Helicopter
Hotel Motel
Food
Restaurant Grocerystore
...
Taxonomy-based resource description Also applicable for audio/video More than one taxonomy might exist in system Resource description: Taxonomy ID and set of categories
June 28, 2006 ICPS'2006 7
Taxonomy-based querying
Query:
1) Request for all resources belonging to category Cj
or
2) Request for all resources belonging to category Cj and satisfying some additional property
Example properties: Text contents, metadata
June 28, 2006 ICPS'2006 8
Searching in unstructured P2P networks Basic search technique: Local execution of query then
forwarding if TTL>0– Naïve flooding (all neighbors)– Normalized flooding (only K neighbors)– Random walks: only one random neighbor, but W walks initiated
Problem: Only a limited # of peers can be searched (query horizon)
Possible improvements: – Routing indices– Summary indexing (bloom filters etc)– Result caching
However: Still limited scalability and coverage
June 28, 2006 ICPS'2006 9
Taxonomy caching
Basic idea: – Maintain taxonomic of remote contents in a
taxonomy cache (TCache) Mapping from taxonomic concept to set of peers Advantages:
– Cheaper to maintain than full-text index– More applicable to multimedia data– More robust wrt. changes in contents
Used to improve query routing Higher recall and reduced latency
June 28, 2006 ICPS'2006 10
Query routing using taxonomy cache (TCache)
1) Basis: one of traditional routing strategies
2) Query forward peers: PF
3) Starting point: PF = neighbors=PN={PN1,…,PNn}
4) Lookup in TCache: Lookup(category) PC={PC1,…,PCm}
5) PF = PN+PC
6) Query forwarded to (subset of) PF
June 28, 2006 ICPS'2006 11
Query forwarding alternatives (1) Query forward peers: PF # of neighbors (excl. previous): Nn # matches from lookup: Nc Ranking of peers in PC:
– Based on # of resources within a category– High # of resources: considered experts
TCB: – Highest ranked in PC + the Nn neighbors in {PN1,…,PNn}– Forwarding to peer in PC called jump– Jump can be to peer beyond query horizon!
TCA: – If Nc ≥ Nn: forward to Nn highest ranked peers in PC
– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) randomly selected neighbors
June 28, 2006 ICPS'2006 12
Query forwarding alternatives (2)
TCCN:– If Nc ≥ Nn: forward to all Nc peers in PC
– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc)
neighbors TCDN:
– If Nc ≥ Nn: forward to Nn/2 highest ranked peers in PC +
random selection of Nn/2 other peers in PC
– If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc)
neighbors
June 28, 2006 ICPS'2006 13
Distributing taxonomic information
Basic mechanism: piggyback matching category with query result– Rsult returned through original path, possibly
involving jumps
– Makes revalidation of contents intermediate TCaches possible
– Coverage will be gradually extended (beyond query horizon)
Lazy distribution by gossiping also possible
June 28, 2006 ICPS'2006 14
TCache architecture and maintenance
Aim: Provide efficient mapping C {PC1,…,PCm} For each category: Peers, # of resources, and TTL TTL:
– Regularly decremented
– Reset to start value at revalidation
Caching policy: Aggressive vs. selective Compacting techniques: Peer upgrade & non-expert pruning
Transportation{(P,#,T)}
Air {(P,#,T)} Train{(P,#,T)}
Boat{(P,#,T)}
Packagetours
{(P,#,T)}
Scheduledflights{(P,#,T)}
Helicopter{(P,#,T)}
June 28, 2006 ICPS'2006 15
Experimental setup
Simulations Excerpts of DMOZ taxonomy Synthetic network topologies Resource allocation: 80/20 rule Queries are taxonomic categories A number of peers have role as querying peers Measured: Contacted peers, messages, recall
and latency In this presentation: Results using flooding and
TCDN query routing
June 28, 2006 ICPS'2006 16
Improvements in recall
NM
(F)NM
(TC)Recall (F)
Recall (TC)
TTL=1 7.8 7.0 0.0022 0.0019
TTL=3 166.7 166.0 0.0117 0.0149
TTL=5 524.7 523.9 0.0282 0.0717
TTL=7 1058.6 1057.7 0.0506 0.1835
TTL=9 1721.0 1719.6 0.0773 0.2930
TTL=11 2566.3 2566.0 0.1104 0.4012
TTL=13 3536.5 3535.8 0.1477 0.4891
TTL=15 4560.2 4558.7 0.1864 0.5755
June 28, 2006 ICPS'2006 17
Primary reason for improvement:More intelligent query forwarding
NC
(F)NC
(TC)Recall (F)
Recall (TC)
TTL=1 7.8 6.7 0.0022 0.0019
TTL=3 45.3 53.4 0.0117 0.0149
TTL=5 110.6 158.0 0.0282 0.0717
TTL=7 199.9 346.8 0.0506 0.1835
TTL=9 305.6 583.1 0.0773 0.2930
TTL=11 437.7 840.3 0.1104 0.4012
TTL=13 586.7 1120.6 0.1477 0.4891
TTL=15 741.6 1372.4 0.1864 0.5755
June 28, 2006 ICPS'2006 18
0
50
100
150
200
250
300
350
400
5 10 15 20
TTL
%Im
pro
vem
ento
v
N1000
N2000
N3000
Improvement and scalability
June 28, 2006 ICPS'2006 19
Latency reduction
TCache results in very fast retrieval of first results
Finding all results approximately similar performance because flooding in both techniques
June 28, 2006 ICPS'2006 20
Summary and further work
Presented motivation and context Taxonomy-based querying and query routing TCache architecture and maintenance Experimental results proving our claims Future/ongoing work:
– Employing the techniques for XML/XPath querying in P2P context (to appear at IEEE P2P’2006)
– Integration of different taxonomies