17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using...

38
nternational World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki Zoi Kaoudi Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens

Transcript of 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using...

Page 1: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

17th International World Wide Web Conference 2008 Beijing, China

XML Data Dissemination using Automata on top of Structured Overlay NetworksIris Miliaraki

Zoi KaoudiManolis Koubarakis

Department of Informatics and TelecommunicationsNational and Kapodistrian University of Athens

Page 2: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 2

Outline

XML Dissemination scenario Problems Background: DHTs Our approach Experiments Future work

Page 3: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 3

XPath/XQuery

??XML Dissemination

system

XML Dissemination

system

XML Dissemination scenario

XPath/XQuery

??

XML

XML

XML

XML

XML

Subscriber Subscriber

Subscriber Subscriber

Publisher Publisher

Publisher Publisher

Publisher Publisher News monitoringNews monitoring

Publication monitoringPublication monitoringYFilter

XTrieFiST

Index-Filter

CentralizedCentralizedDistributedDistributed

ONYX

Gong et al. [ICDE05]XPush

Parallel/Hierarchical XTrie

Snoeren [SOSP 2001]

Page 4: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 4

XML Dissemination: Broker-based architecture Mesh or tree-based overlays

XML

XML

XML

Subscriber Subscriber

Publisher Publisher

Publisher Publisher

XPath/XQuery

??

XPath/XQuery

??

XML

XMLSubscriber Subscriber

Publisher Publisher

Page 5: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 5

Problems

Load imbalances

Page 6: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 6

XML Dissemination: Broker-based architecture Systems like ONYX and work of Gong et al. [ICDE05]

Mesh or tree-based overlays

XML

XML

XML

XML

Publisher Publisher

Publisher Publisher

Publisher Publisher

XPath/XQuery

??

XMLSubscriber Subscriber

Subscriber Subscriber XPath/XQuery

??

Page 7: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 7

Problems

Load imbalances

Centralized control Single point of failure and bottleneck

Page 8: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 8

XML Dissemination: Broker-based architecture Systems like ONYX and work of Gong et al. [ICDE05]

Mesh or tree-based overlays

XML

XML

XML

XML

Publisher Publisher

Publisher Publisher

Publisher Publisher

Subscriber Subscriber XPath/XQuery

??

XPath/XQuery

??

XMLSubscriber Subscriber

Page 9: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 9

Problems

Load imbalances

Centralized control Single point of failure and bottleneck

Scalability (size of routing tables)

Page 10: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 10

XML Dissemination: Broker-based architecture Systems like ONYX and work of Gong et al. [ICDE05]

Mesh or tree-based overlays

XML

XML

XML

XML

Publisher Publisher

Publisher Publisher

Publisher Publisher

Subscriber Subscriber XPath/XQuery

??

XPath/XQuery

??

XMLSubscriber Subscriber

Page 11: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 11

Background: DHTs Structured overlay networks

Solve the item location problem in a distributed and dynamic network of nodes (in O(log N) hops): Let x be some data item. Find x!

Distributed version of hash table data structure id=Hash(K)

Main operations: Put: given a key (for a data item),

map the key onto a node. Get: Find the location of a data item

with a given a key. Successor peer → responsible peer

Page 12: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 12

XML Dissemination revisited:Structured overlay network architecture

XML

XML

XML

XML

XML

Subscriber Subscriber

Subscriber Subscriber

Publisher Publisher

Publisher Publisher

Publisher Publisher

XPath/XQuery

??

XPath/XQuery

??

Page 13: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 13

Problems revisited

Load imbalances

Centralized control Single point of failure and bottleneck

Scalability (size of routing tables)

Page 14: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 14

Automata-based approaches

XFilter and YFilter, ONYX, XTrie, IndexFilter, FiST etc.

Main idea Construct an automaton from a set of

XPath/Xquery queries Use it as a matching engine against the XML

documents

Page 15: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 15

Q1: /dblp/phdthesis/year = ‘2008’

YFilter – NFA Construction

3year Q1

0

dblp

phdthesis

1

2

Page 16: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 16

Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’

YFilter – NFA Construction

3year Q1

0

dblp

phdthesis

1

2

5school Q2

proceedings 4

Page 17: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 17

Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’

YFilter – NFA Construction

3year Q1

0

dblp

phdthesis

1

2

titleQ3

6

5school Q2

proceedings 4

Page 18: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 18

8author Q4

Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’

YFilter – NFA Construction

3year Q1

0

dblp

phdthesis

1

2

*

7

titleQ3

6

5school Q2

proceedings 4

Page 19: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 19

ε

9

*

Q5: //*/cite = [12743]

11cite Q5

10*

YFilter – NFA Construction

3year Q1

0

dblp

phdthesis

1

2

8author Q4

*

7

titleQ3

6

5school Q2

proceedings 4

Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’

Page 20: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 20

ε

9

*

Q5: //*/cite = [12743]

11cite Q5

10*

YFilter – NFA Construction

3year Q1

0

dblp

phdthesis

1

2

8author Q4

*

7

titleQ3

6

5school Q2

proceedings 4

Q1: /dblp/phdthesis/year = ‘2008’Q2: /dblp/proceedings/school = ‘Univ. of Athens’Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’

Page 21: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 21

Main idea

Utilize a distributed version of a state-of-the-art approach YFilter

Instead of a centralized NFA

Distribute the NFA in the DHT

Page 22: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 22

Distributing the NFA on top of DHT

P1P2

P9

P8

P7

P6

P3

P5

P4

P10

State key 0 1 2 3 4 5 6 7 8 9 10 11

Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10

2

3

0

9

1

11

4

10

7

5 6

Page 23: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 23

Distributing the NFA on top of DHT

P1P2

P9

P8

P7

P6

P3

P5

P4

P10

State key 0 1 2 3 4 5 6 7 8 9 10 11

Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10

2

3

0

9

1

11

4

10

7

5 6

1 2 4 7

Page 24: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 24

Distributing the NFA on top of DHT

P1P2

P9

P8

P7

P6

P3

P5

P4

P10

State key 0 1 2 3 4 5 6 7 8 9 10 11

Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10

2

3

0

9

1

11

4

10

7

5 6

1 2 4 7

ℓ=0 ℓ=1

Page 25: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 25

Distributing the NFA on top of DHT

State key 0 1 2 3 4 5 6 7 8 9 10 11

Successor peer P3 P5 P1 P2 P6 P7 P7 P8 P10 P4 P9 P10

P1

P2P9

P8

P7

P6

P3

P5

P4

P10

2 3

30 1 9 10

9 10

1 2 4 7

11

4 5 6

10 11

7 8

5 6

Page 26: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 26

YFilter - NFA Execution

<dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings></dblp>

Incoming XML documentThese paths can be executed in

parallel!

0

1 9 10

4 7 9 10

5 9 10 6 9 10

Runtime stack

dblp

proceedingsschool

title

*

ε

*

*

Start of document

End of document

Page 27: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 27

<dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings></dblp>

Start of document

End of document

Distributed NFA execution – Iterative

Incoming XML document

0

1 9 10

4 7 9 10

5 9 10 6 9 10Publisher

P1P2

P9

P8

P7

P6

P3

P5

P4

P10

2

3

0

9

1

11

4

10

7 6

5

Publisher becomes overloaded!

Page 28: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 28

<dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings></dblp>

Distributed NFA execution - RecursiveIncoming XML document

Publisher

P1P2

P9

P8

P7

P6

P3

P5

P4

P10

2

3

0

9

1

11

4

10

7 6

5

Start of document

End of document

0

1 9 10

0

9

0

10

0

1

9 10

4 7

0

1

40

1

7

10

9

0

1

4

5

0

1

4

6

0

1

4

0

1

4

Page 29: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 29

Experimental evaluation

Chord simulator 2 different document workloads

Aggregated Including DBLP, NITF, ebXML, Auction (XMark)

NITF 2 kinds of query sets

Random Distinct

Page 30: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 30

Metrics

Network traffic total number of messages

Latency longest chain of hops

Filtering load number of messages received during execution

Page 31: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 31

Iterative vs Recursive

Page 32: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 32

Varying number of queries – Network traffic

Page 33: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 33

Varying number of queries - Latency

Page 34: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 34

Load balancing

Virtual peers Originally proposed in Chord Mapping of multiple virtual peers to each real peer

Load-shedding Replicate on demand

Page 35: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 35

Load balancing – Filtering load

Page 36: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 36

Conclusions

DHT-based protocols overcoming weaknesses of broker-based architectures

Utilize a distributed YFilter engine Exploit inherent parallelism of an automaton

Experimental evaluation

Page 37: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 37

Future Work

Implementation and experimenting on an Internet-scale testbed like PlanetLab

More sophisticated methods for predicate evaluation

Page 38: 17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

24 April 2008

17th International World Wide Web Conference 2008 Beijing,

China 38

Thank you for your attention

Questions?