AmbientDB Relational Query Processing in a P2P Network

42
AmbientDB AmbientDB Relational Query Processing in a P2P Ne Relational Query Processing in a P2P Ne twork twork Peter Boncz and Caspar Treijtel LEE BYUNGIL PL Lab. Hongik University 2004.11.14

description

AmbientDB Relational Query Processing in a P2P Network. Peter Boncz and Caspar Treijtel LEE BYUNGIL PL Lab. Hongik University 2004.11.14. Outline. 1. Introduction 1.1 Goal 1.2 Assumptions 1.3 Example: Collaborative Filtering in a P2P Database 1.4 Overview - PowerPoint PPT Presentation

Transcript of AmbientDB Relational Query Processing in a P2P Network

Page 1: AmbientDB Relational Query Processing in a P2P Network

AmbientDBAmbientDBRelational Query Processing in a P2P NetworkRelational Query Processing in a P2P Network

Peter Boncz and Caspar Treijtel

LEE BYUNGILPL Lab.

Hongik University

2004.11.14

Page 2: AmbientDB Relational Query Processing in a P2P Network

2

OutlineOutline

1. Introduction1.1 Goal

1.2 Assumptions

1.3 Example: Collaborative Filtering in a P2P Database

1.4 Overview

2. AmbientDB Architecture2.1 Data Model

2.2 Query Execution in AmbientDB

2.3 Dataflow Execution

2.4 Executing the Collaborative Filtering Query

3. DHTs in AmbientDB3.1 Example: Approximated Collaborative Filtering

4. Conclusion

Page 3: AmbientDB Relational Query Processing in a P2P Network

3

1. Introduction (1)1. Introduction (1)

AmbientDB A new peer-to-peer (P2P) DBMS prototype Developed at CWI (Centrum voor Wiskurde en Informatica) Distributed an ad-hoc P2P network Global query algebra

Multi-wave stream processing plans

Ambient Intelligence (AmI) Digital environments in which multimedia services are sensitive

to people’s needs

Page 4: AmbientDB Relational Query Processing in a P2P Network

4

Music Playlist ScenarioMusic Playlist Scenario

amP2P player Log - mata information

Homogeneous

Content - AmbientDB instance, or external sources

Heterogeneous

AmbientDB Its collection Only Meta-information

Page 5: AmbientDB Relational Query Processing in a P2P Network

5

1.1 Goal1.1 Goal

Full relational database functionalityCooperate in ad-hoc way with other AmbientDB devic

es

Propose A general architecture for AmbientDB Complex query processing in ad-hoc P2P network

Page 6: AmbientDB Relational Query Processing in a P2P Network

6

1.2 Assumptions (1)1.2 Assumptions (1)

Upscaling (flexibility) Amount of cooperating devices to be potentially large Home environment and ad-hoc P2P network

Downscaling Devices often have few resources (CPU, memory, network, battery)

Schema integration All devices operate under a common global schema

Data placement Data placement is determined by user

Network failure Resilience of Chord While a query runs, the routing tree stays intact

Page 7: AmbientDB Relational Query Processing in a P2P Network

7

ChordChord

Page 8: AmbientDB Relational Query Processing in a P2P Network

8

1.2 Assumptions (2)1.2 Assumptions (2)

Distributed database Priori Not in AmbientDB

Federated database Statically Heterogeneous schema integration

Mobile database Centralized database server and client (mobile node)

P2P file sharing system Non-centralized and ad-hoc topologies Simple keyword text search

Page 9: AmbientDB Relational Query Processing in a P2P Network

9

Example Music SchemaExample Music Schema

The global schema “AMP2P” in AmbientDB

distributed table On the global level The union of all horizontal frag

ments of these tables

Page 10: AmbientDB Relational Query Processing in a P2P Network

10

1.3 Example : Collaborative1.3 Example : Collaborative Filtering in a P2P Database Filtering in a P2P Database (1)(1)

amP2P player Access to a local content repository (digital music collection) AmbientDB instance

Share all music content in the “home zone” Only share the meta-information in the huge P2P network

Page 11: AmbientDB Relational Query Processing in a P2P Network

11

1.3 Example : Collaborative1.3 Example : Collaborative Filtering in a P2P Database Filtering in a P2P Database (2)(2)

Memory-based implicit voting scheme

Predicted vote for the active user for item j vi,j = the vote of user i on item j

w(a,i) = weight function defined on the active user and user i

vi = average vote for user i

k = nomalizing factor

weight(usera, useri) Times the example song has been fully played by user i

Refined form Negative information – skipped

Page 12: AmbientDB Relational Query Processing in a P2P Network

12

Collaborative Filtering Query in SQLCollaborative Filtering Query in SQL

Page 13: AmbientDB Relational Query Processing in a P2P Network

13

1.4 Overview1.4 Overview

General architecture Include Data model

Query execution Three-level query execution process

DHT (Distributed Hash Table) Global table indices

Optimize the queryRelated work & future workConclusion

Page 14: AmbientDB Relational Query Processing in a P2P Network

14

AmbientDB ArchitectureAmbientDB Architecture

Page 15: AmbientDB Relational Query Processing in a P2P Network

15

2. AmbientDB Architecture2. AmbientDB Architecture

Distributed Query processor Execute query on all ad-hoc connected devices

P2P protocol Chord

scalable lookup and routing scheme P2P IP overlay networks made out of unreliable connections Query node = root A small number of connections per node Simultaneous bi-directional communication and query processing

DHTs – global table indices Local DB component

Local table Embedded database External data source – wrapper component (distributed database system)

Schema integration engine Meta-data translation Using view-based schema mappings

Page 16: AmbientDB Relational Query Processing in a P2P Network

16

AmbientDB Routing Tree Using IP OverlayAmbientDB Routing Tree Using IP Overlay

Page 17: AmbientDB Relational Query Processing in a P2P Network

17

2.1 Data Model (1)2.1 Data Model (1)

Standard relational data model & algebra as query language

Query are formulated against global tablesLocal node or limited set of node or all

reachable nodesConverging answer

Query locally

Re-issue iteratively over more nodes

Page 18: AmbientDB Relational Query Processing in a P2P Network

18

2.1 Data Model (2)2.1 Data Model (2)

Abstract Table LT (Local Table)

Each node has private schema

Global schema – global table T

All participating nodes Ni carry a table instance Ti

In query node Ti may be accessed as a LT

DT (Distributed Table)

Q : Set of node that participate in some global query

The union of local table instances

Page 19: AmbientDB Relational Query Processing in a P2P Network

19

2.1 Data Model (3)2.1 Data Model (3)

PT (Partitioned Table) Specialization of the DT All participating tuples in each Ti are disjunct between all nodes Advantage over DT

Exact query answers can often be computed in an efficient distributed fashion By broadcasting a query and letting each node compute a local result without n

eed for communication

Attaching a bitmap index Ti.Q to each local table Ti

“virtual” column #NODEID

Be aware in which node are located Stored in a DT/PT

Location-specific query restrictions

Page 20: AmbientDB Relational Query Processing in a P2P Network

20

LT, DT and PTLT, DT and PT

Page 21: AmbientDB Relational Query Processing in a P2P Network

21

2.2 Query Execution in AmbientDB (1)2.2 Query Execution in AmbientDB (1)

Three level translation Abstract level

User query Selection, join, aggregation, sort Lists

(List<Type>)

List instances <a,b,c>

Concrete level Table parameters, return value Partition, union

Execution level Wave-plans

Page 22: AmbientDB Relational Query Processing in a P2P Network

22

The Abstract Global AlgebraThe Abstract Global Algebra

Page 23: AmbientDB Relational Query Processing in a P2P Network

23

The Concrete Global AlgebraThe Concrete Global Algebra

Page 24: AmbientDB Relational Query Processing in a P2P Network

24

2.2 Query Execution in AmbientDB (2)2.2 Query Execution in AmbientDB (2)

Starting at the leaves Abstract query plan -> concrete Concrete operator have concrete result type Process continue to the root of the query graph

Local result table, hence LT

Local concrete variant of all abstract operators All tables -> LT

Concrete union (T1)-> LT More efficient alternative query plans

Page 25: AmbientDB Relational Query Processing in a P2P Network

25

2.2 Query Execution in AmbientDB (3)2.2 Query Execution in AmbientDB (3)

select, aggr, order support distributed execution(dist) Execute in all node on their local partition (LT) of a PT or a DT Produce again a distributed result (PT or DT) Broadcast the query through the routing tree The result is again dispersed over all node as a PT or DT

Aggrmerge = aggrlocal(unionmerge(DT)):LT Reduce the fragments to be collected in the query node Save considerable bandwidth

Page 26: AmbientDB Relational Query Processing in a P2P Network

26

2.2 Query Execution in AmbientDB (4)2.2 Query Execution in AmbientDB (4)

join variants Broadcast join (LT, T1)->T1 Foreign-key join (T1,DT)->T1

Referential integrity to minimize communication

Split join (LT1,T1)->T1 Reduce bandwidth consumption

O(T*N) -> O(T*log(N))

partition A special operator that performs double elimination Create a PT from a DT by creating a tuple participation bitmap at all no

des To be able to use the dist operators

We should convert a DT to a PT

Page 27: AmbientDB Relational Query Processing in a P2P Network

27

MappingsMappings

Page 28: AmbientDB Relational Query Processing in a P2P Network

28

2.3 Dataflow Execution (1)2.3 Dataflow Execution (1)

Query processing paradigm Routing tree using TCP connections is used to pass bi-direction

al tuple streams Multiple simultaneous such waves (upward and downward)

Third translation phase Concrete query plan -> wave-plans Concrete operator

One or more waves (Local dataflow aglebra operators)

Page 29: AmbientDB Relational Query Processing in a P2P Network

29

2.3 Dataflow Execution (2)2.3 Dataflow Execution (2)

dist plans for select, aggr, order and foreign-key join buffer-to-buffer local operator in each node, without further communic

ation

broadcast join Propagates a tuple wave through the network

split Split(<true,true>,<c1,c1>) Ordered -> effectively forming a DT/PT

scan-select, quick-sort, merge-join, heap-based top-N, ordered aggregation All stream-based Require little memory

Page 30: AmbientDB Relational Query Processing in a P2P Network

30

The Dataflow AlgebraThe Dataflow Algebra

Page 31: AmbientDB Relational Query Processing in a P2P Network

31

2.4 Executing the Collaborative Filtering Query 2.4 Executing the Collaborative Filtering Query (1)(1)

Page 32: AmbientDB Relational Query Processing in a P2P Network

32

2.4 Executing the Collaborative Filtering Query 2.4 Executing the Collaborative Filtering Query (2)(2)

Page 33: AmbientDB Relational Query Processing in a P2P Network

33

2.4 Executing the Collaborative Filtering Query 2.4 Executing the Collaborative Filtering Query (3)(3)

Problems Query 1

Large list of all users that have ever listened to the example song Hog resources from all nodes in the network

Query 2 Basically send all log record to the query node for aggregation

More efficiently in an AmbientDB enriched with DHTs

Page 34: AmbientDB Relational Query Processing in a P2P Network

34

3. DHTs in AmbientDB (1)3. DHTs in AmbientDB (1)

Useful lookup structures for large-scale P2P applications

Reduce the amount of nodes involved in answering a query Involving many nodes

Decrease query performance Create an overload in the average query frequency

Gnutella (not use DHT or global indices) Easy to locate popular music Difficult to locate less wel-known songs

Page 35: AmbientDB Relational Query Processing in a P2P Network

35

3. DHTs in AmbientDB (2)3. DHTs in AmbientDB (2)

To enable the query optimizer to automatically accelerate selection queries using such DHTs

DHT indices can be exploited by a query optimizer to accelerate lookup queries

Special form of a PT, as the partitions are disjunctselectchord(DHT):LT

Dataflow level Route a message to the Chord finger on which the selection key-value has

hes Retrieving all corresponding tuples as an LT via a direct TCP/IP transfer

Non-complete index

Page 36: AmbientDB Relational Query Processing in a P2P Network

36

DT and DHT in AmbientBDT and DHT in AmbientB

Page 37: AmbientDB Relational Query Processing in a P2P Network

37

3.1 Example: Approximated Collaborative Filtering (1)3.1 Example: Approximated Collaborative Filtering (1)

HISTO Static histogram of fully-

listened-to songs per user

Reduce the histogram computation cost of query

Page 38: AmbientDB Relational Query Processing in a P2P Network

38

Optimized collaborative filtering query in Optimized collaborative filtering query in SQLSQL

Page 39: AmbientDB Relational Query Processing in a P2P Network

39

3.1 Example: Approximated Collaborative 3.1 Example: Approximated Collaborative Filtering (2)Filtering (2)

Page 40: AmbientDB Relational Query Processing in a P2P Network

40

3.1 Example: Approximated Collaborative 3.1 Example: Approximated Collaborative Filtering (3)Filtering (3)

Page 41: AmbientDB Relational Query Processing in a P2P Network

41

Network Bandwidth ComparedNetwork Bandwidth Compared

Page 42: AmbientDB Relational Query Processing in a P2P Network

42

4. Conclusion4. Conclusion

Full query processing architecture Executing queries in a declarative, optimizable language, over an

ad-hoc P2P network

DHT Efficient global indices