17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using...
-
Upload
teresa-carbonell -
Category
Documents
-
view
212 -
download
0
Transcript of 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using...
17th International World Wide Web Conference 2008 Beijing, China
XML Data Dissemination using Automata on top of Structured Overlay NetworksIris Miliaraki
Zoi KaoudiManolis Koubarakis
Department of Informatics and TelecommunicationsNational and Kapodistrian University of Athens
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 2
Outline
XML Dissemination scenario Problems Background: DHTs Our approach Experiments Future work
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 3
XPath/XQuery
??XML Dissemination
system
XML Dissemination
system
XML Dissemination scenario
XPath/XQuery
??
XML
XML
XML
XML
XML
Subscriber Subscriber
Subscriber Subscriber
Publisher Publisher
Publisher Publisher
Publisher Publisher News monitoringNews monitoring
Publication monitoringPublication monitoringYFilter
XTrieFiST
Index-Filter
CentralizedCentralizedDistributedDistributed
ONYX
Gong et al. [ICDE05]XPush
Parallel/Hierarchical XTrie
Snoeren [SOSP 2001]
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 4
XML Dissemination: Broker-based architecture Mesh or tree-based overlays
XML
XML
XML
Subscriber Subscriber
Publisher Publisher
Publisher Publisher
XPath/XQuery
??
XPath/XQuery
??
XML
XMLSubscriber Subscriber
Publisher Publisher
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 5
Problems
Load imbalances
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 6
XML Dissemination: Broker-based architecture Systems like ONYX and work of Gong et al. [ICDE05]
Mesh or tree-based overlays
XML
XML
XML
XML
Publisher Publisher
Publisher Publisher
Publisher Publisher
XPath/XQuery
??
XMLSubscriber Subscriber
Subscriber Subscriber XPath/XQuery
??
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 7
Problems
Load imbalances
Centralized control Single point of failure and bottleneck
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 8
XML Dissemination: Broker-based architecture Systems like ONYX and work of Gong et al. [ICDE05]
Mesh or tree-based overlays
XML
XML
XML
XML
Publisher Publisher
Publisher Publisher
Publisher Publisher
Subscriber Subscriber XPath/XQuery
??
XPath/XQuery
??
XMLSubscriber Subscriber
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 9
Problems
Load imbalances
Centralized control Single point of failure and bottleneck
Scalability (size of routing tables)
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 10
XML Dissemination: Broker-based architecture Systems like ONYX and work of Gong et al. [ICDE05]
Mesh or tree-based overlays
XML
XML
XML
XML
Publisher Publisher
Publisher Publisher
Publisher Publisher
Subscriber Subscriber XPath/XQuery
??
XPath/XQuery
??
XMLSubscriber Subscriber
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 11
Background: DHTs Structured overlay networks
Solve the item location problem in a distributed and dynamic network of nodes (in O(log N) hops): Let x be some data item. Find x!
Distributed version of hash table data structure id=Hash(K)
Main operations: Put: given a key (for a data item),
map the key onto a node. Get: Find the location of a data item
with a given a key. Successor peer → responsible peer
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 12
XML Dissemination revisited:Structured overlay network architecture
XML
XML
XML
XML
XML
Subscriber Subscriber
Subscriber Subscriber
Publisher Publisher
Publisher Publisher
Publisher Publisher
XPath/XQuery
??
XPath/XQuery
??
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 13
Problems revisited
Load imbalances
Centralized control Single point of failure and bottleneck
Scalability (size of routing tables)
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 14
Automata-based approaches
XFilter and YFilter, ONYX, XTrie, IndexFilter, FiST etc.
Main idea Construct an automaton from a set of
XPath/Xquery queries Use it as a matching engine against the XML
documents
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 15
Q1: /dblp/phdthesis/year = ‘2008’
YFilter – NFA Construction
3year Q1
0
dblp
phdthesis
1
2
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 16
Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’
YFilter – NFA Construction
3year Q1
0
dblp
phdthesis
1
2
5school Q2
proceedings 4
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 17
Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’
YFilter – NFA Construction
3year Q1
0
dblp
phdthesis
1
2
titleQ3
6
5school Q2
proceedings 4
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 18
8author Q4
Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’
YFilter – NFA Construction
3year Q1
0
dblp
phdthesis
1
2
*
7
titleQ3
6
5school Q2
proceedings 4
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 19
ε
9
*
Q5: //*/cite = [12743]
11cite Q5
10*
YFilter – NFA Construction
3year Q1
0
dblp
phdthesis
1
2
8author Q4
*
7
titleQ3
6
5school Q2
proceedings 4
Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 20
ε
9
*
Q5: //*/cite = [12743]
11cite Q5
10*
YFilter – NFA Construction
3year Q1
0
dblp
phdthesis
1
2
8author Q4
*
7
titleQ3
6
5school Q2
proceedings 4
Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 21
Main idea
Utilize a distributed version of a state-of-the-art approach YFilter
Instead of a centralized NFA
Distribute the NFA in the DHT
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 22
Distributing the NFA on top of DHT
P1P2
P9
P8
P7
P6
P3
P5
P4
P10
State key 0 1 2 3 4 5 6 7 8 9 10 11
Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10
2
3
0
9
1
11
4
10
7
5 6
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 23
Distributing the NFA on top of DHT
P1P2
P9
P8
P7
P6
P3
P5
P4
P10
State key 0 1 2 3 4 5 6 7 8 9 10 11
Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10
2
3
0
9
1
11
4
10
7
5 6
1 2 4 7
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 24
Distributing the NFA on top of DHT
P1P2
P9
P8
P7
P6
P3
P5
P4
P10
State key 0 1 2 3 4 5 6 7 8 9 10 11
Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10
2
3
0
9
1
11
4
10
7
5 6
1 2 4 7
ℓ=0 ℓ=1
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 25
Distributing the NFA on top of DHT
State key 0 1 2 3 4 5 6 7 8 9 10 11
Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10
P1
P2P9
P8
P7
P6
P3
P5
P4
P10
2 3
30 1 9 10
9 10
1 2 4 7
11
4 5 6
10 11
7 8
5 6
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 26
YFilter - NFA Execution
<dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings></dblp>
Incoming XML documentThese paths can be executed in
parallel!
0
1 9 10
4 7 9 10
5 9 10 6 9 10
Runtime stack
dblp
proceedingsschool
title
*
ε
*
*
Start of document
End of document
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 27
<dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings></dblp>
Start of document
End of document
Distributed NFA execution – Iterative
Incoming XML document
0
1 9 10
4 7 9 10
5 9 10 6 9 10Publisher
P1P2
P9
P8
P7
P6
P3
P5
P4
P10
2
3
0
9
1
11
4
10
7 6
5
Publisher becomes overloaded!
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 28
<dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings></dblp>
Distributed NFA execution - RecursiveIncoming XML document
Publisher
P1P2
P9
P8
P7
P6
P3
P5
P4
P10
2
3
0
9
1
11
4
10
7 6
5
Start of document
End of document
0
1 9 10
0
9
0
10
0
1
9 10
4 7
0
1
40
1
7
10
9
0
1
4
5
0
1
4
6
0
1
4
0
1
4
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 29
Experimental evaluation
Chord simulator 2 different document workloads
Aggregated Including DBLP, NITF, ebXML, Auction (XMark)
NITF 2 kinds of query sets
Random Distinct
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 30
Metrics
Network traffic total number of messages
Latency longest chain of hops
Filtering load number of messages received during execution
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 31
Iterative vs Recursive
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 32
Varying number of queries – Network traffic
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 33
Varying number of queries - Latency
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 34
Load balancing
Virtual peers Originally proposed in Chord Mapping of multiple virtual peers to each real peer
Load-shedding Replicate on demand
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 35
Load balancing – Filtering load
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 36
Conclusions
DHT-based protocols overcoming weaknesses of broker-based architectures
Utilize a distributed YFilter engine Exploit inherent parallelism of an automaton
Experimental evaluation
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 37
Future Work
Implementation and experimenting on an Internet-scale testbed like PlanetLab
More sophisticated methods for predicate evaluation
24 April 2008
17th International World Wide Web Conference 2008 Beijing,
China 38
Thank you for your attention
Questions?