1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query...

47
1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non- join queries Updating distributed data

Transcript of 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query...

Page 1: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

1

TDD: Topics in Distributed Databases

Distributed Databases

Distributed database

Distributed query processing: joins and non-join queries

Updating distributed data

Page 2: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

2

Distributed databases

Data is stored in several sites (nodes), geographically or

administratively across multiple systems

Each site is running an independent DBMS What do we get?

– Increased availability and reliability– Increased parallelism

Complications– Catalog management: distributed data independence and

distributed transaction atomicity– Query processing and optimization: replication and

fragmentation– Increased update costs, concurrency control: locking,

deadlock, commit protocol, recovery

Data centers

Page 3: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

Architectures

33

Page 4: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

4

Homogeneous vs heterogeneous systems

Homogeneous: identical DBMS, aware of each other, cooperate

Heterogeneous: different schemas/DBMS– Multidatabase system: uniform logical view of the data --

common schema– difficult, yet common: system is typically gradually

developed

network

DB

DBMS

local schema

DB

DBMS

local schema

DB

DBMS

local schema

network

global schema

query answer

Page 5: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

5

Architectures

Client-server: client (user interface, front end), server (DBMS)– Client ships query to a server (query shipping)– All query processing at server

client

server serverserver

query answer

client-server

client

Page 6: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

6

Architectures

Collaborating server: query can span several servers

serverserver

serverquery answer

collaborating-server

Middleware:– Coordinator: queries and transactions across servers

Page 7: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

7

Warehouse architecture

data warehouse

client applications

integrator

monitor/wrapper monitor/wrapper monitor/wrapper

RDB OODB XML

Page 8: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

8

Monitor/wrapper

A monitor/wrapper for each data source translation: translate an information source into a common

integrating model change detection: detect changes to the underlying data source

and propagate the changes to the integrator

– active databases (triggers: condition, event, action)

– logged sources: inspecting logs

– periodic polling, periodic dumps/snapshots Data cleaning:

– detect erroneous/incomplete information to ensure validity

– back flushing: return cleaned data to the source

Page 9: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

9

Integrator

Receive change notifications from the wrapper/monitors and reflect the changes in the data warehouse.

Typically a rule-based engine: merging information (data fusion) handling references Data cleaning:

– removing redundancies and inconsistencies

– inserting default values

– blocking sources

Page 10: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

10

When to use data warehouse

Problem: potential inconsistencies with the sources.

Commonly used for relatively “static” data

when clients require specific, predicable portion of the available information

when clients require high query performance but not necessarily the most recent state of the information

when clients want summarized/aggregated information such as historical information

Examples:

scientific data

historical enterprise data

caching frequently requested information

Page 11: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

11

Data warehouse vs. materialized views

materialized view is over an individual structured database, while a warehouse is over a collection of heterogeneous, distributed data sources

materialized view typically has the same form as in the underlying database, while a warehouse stores highly integrated and summarized data

materialized view modifications occur within the same transaction updating its underlying database, while a warehouse may have to deal with independent sources:

– sources simply report changes

– sources may not have locking capability

– integrator is loosely coupled with the sources

Page 12: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

12

Mediated system architecture

Virtual approach: data is not stored in the middle tier

client applications

Mediator

wrapper wrapper wrapper

RDB OODB XML

Page 13: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

13

Lazy vs. eager approaches

Lazy approach (mediated systems): accept a query, determine the appropriate set of data sources,

generate sub-queries for each data source obtain results from the data sources, perform translation,

filtering and composing, and return the final answer

Eager approach (warehouses): information from each source that may be of interest is

extracted in advance, translated, filtered, merged with relevant sources, and stored in a repository

query is evaluated directly against the repository, without accessing the original information sources

Page 14: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

14

Data warehouse vs. mediated systems

Efficiency

– response time: at the warehouse, queries can be answered efficiently without accessing original data sources. Advantageous when data sources are slow, expensive or periodically unavailable, or when translation, filtering and merging require significant processing

– space: warehousing consumes extra storage space Extensibility: warehouse consistency with the sources: warehouse data may become out

of date applicability:

– warehouses: for high query performance and static data– mediated systems: for information that changes rapidly

Page 15: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

15

Distributed data storage -- replication

Fragments of a relation are replicated at several sites: R is

fragmented into R1, R2, R3 Why?

– Increase availability/reliability: if one site fails– Increase parallelism: faster query evaluation – Increase overhead on updates: consistency

Dynamic issues: synchronous vs. asynchronous

R1 R2 R3 R2

Site 1Site 2

Primary copy: e.g.,

Bank: an account at the site in which it was

opened

Airline: an flight at the site from which it originates

Page 16: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

16

Distributed data storage -- fragmentation

A relation R may be fragmented or partitioned Horizontal Vertical: lossless join

Question: how to reconstruct the original R?

network

DB

DBMS

local schema

DB

DBMS

local schema

DB

DBMS

local schema

network

global schema

query answerEDIgrace003

NYCmary002

NYCjoe001

citynameeid fragmentation: determined by local ownership

NYCEDI

Page 17: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

17

Transparency, independence

Distributed data transparency (independence):

– location (name) transparency

– fragmentation

– replication transparency (catalog management)

Transaction atomicity: across several sites

– All changes persist if the transaction commits

– None persists if the transaction aborts

Data independency and transaction atomicity are not supported

currently: the users have to be aware of where data is located

Page 18: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

Distributed query processing: joins and non-join queries

1818

Page 19: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

19

Distributed query processing and optimization

New challenges– Data transmission costs (network)– parallelism– Choice of replicas: lowest transmission cost– Fragmentation: to reconstruct the original relation

Query decomposition: query rewriting/unfolding

depending on how data is fragmented/replicated

network

DB

DBMS

local schema

DB

DBMS

local schema

DB

DBMS

local schema

network

global schema

query answer

sub-query sub-querysub-query

decomposition

Page 20: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

20

Non-join queries

Schema: account(acc-num, branch-name, balance) Query Q: select * from account where branch-

name = `EDI” Storage: database DB is horizontally fragmented, based on

branch-name: NYC, Philly, EDI, … denoted by DB1, …, DBn DB = DB1 … DBn Processing:

– Rewrite Q into Q(DB1) … Q(DBn)– Q(DBi) is empty if branch-name <> EDI

• Q(DB1), where DB1 is the EDI branch– Q(DB1) = Q’(DB1)

• Q’: select * from account

Page 21: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

21

Simple join processing – data shipping

R1 R2 R3

where Ri is at site i, S0 is the site where the query is issued Option 1: send copies of R1, R2, R3 to S0 and compute the

joins at S0 Option 2:

– Send R1 to S2, compute temp1 R1 R2 at S2– Send temp1 to S3, compute temp2 R3 temp1 at S3– Send temp2 to S0

Decision on strategies: The volume of data shipped The cost of transmitting a block Relative speed of processing at each site

Page 22: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

22

Semijoin – reduce communication costs

R1 R2, where Ri is at site i, Compute temp1 (R1 R2) R1 at site 1

projection on join attributes only; assume R1 smaller Ship temp1 to site 2, instead of the entire relation of R1 Compute temp2 R2 temp1 at S2 Ship temp2 to site 1 compute result R1 temp2 at S1

Effectiveness If sufficiently small fraction of the relation of R2 contributes to

the join Additional computation cost may be higher than the savings in

communication costs

Page 23: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

23

Bloomjoin – reduce communication costs

R1 R2, where Ri is at site i,

Compute a bit vector of size k by hashing (R1 R2) R1

– bit: set to 1 if some tuple hashes to it– smaller than the projection (constant size)

Ship the vector to site 2, instead of the entire relation of R1 Hash R2 using the same hashing function Ship to site 1 only those tuples of R2 that also hash to 1, temp1 compute result R1 temp1 at S1

Effectiveness Less communication costs: bit-vector vs projection The size of the reduction by hashing may be larger than that of

projectionQuestion: set difference?

Page 24: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

24

exploring parallelism

Consider R1 R2 R3 R4, where Ri is at site i

temp1 R1 R2, by shipping R1 to site 2

temp2 R3 R4, by shipping R3 to site 4

result temp1 temp2 -- pipelining

Question: R1 R2 R3, using

Pipelined parallelism

Semi-join

both

parallel

Page 25: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

25

Distributed query optimization

The volume of data shipped The cost of transmitting a block Relative speed of processing at each site Site selection: replication

Two-step optimization At compile time, generate a query plan – along the same lines

as centralized DBMS Every time before the query is executed, transform the plan and

carry out site selection (determine where the operators are to

be executed) – dynamic, just site selection

Page 26: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

26

Practice: validation of functional dependencies

A functional dependency (FD) defined on schema R: X Y– For any instance D of R, D satisfies the FD if for any pair of

tuples t and t’, if t[X] = t’[X], then t[Y] = t’[Y]– Violations of the FD in D:

{ t | there exists t’ in D, such that t[X] = t’[X], but t[Y] t’[Y] }

Now suppose that D is fragmented and distributed

Develop an algorithm that given fragmented and distributed D and an FD, computes all the violations of the FD in D– semijoin– bloomjoin

Questions: what can we do if we are given a set of FDs to validate?

horizontally or vertically

Minimize data shipment

Page 27: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

27

Practice: relational operators

Consider a relation R that is vertically partitioned and distributed across n sites

Develop an algorithm to implement

– A R,

– C R

by using – semijoin– bloomjoin

Column-oriented DBMS: store tables as sections of columns of data, rather than rows (tuples); good for, eg, certain aggregate queries

Page 28: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

28

Practice: relational operators

Consider relations R1 and R2 that are horizontally partitioned and distributed across n sites

Develop an algorithm to implement R1 R1.A = R2.B R2 by using

– semijoin– bloomjoin

Question: is your algorithm parallel scalable? That is, the more processors are used, the faster it is

Page 29: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

Updating distributed data

2929

Page 30: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

30

Updating Distributed data

Fragmentation: an update may go across several sites– Local transaction– Global transaction

Replication: multiple copies of the same data -- consistency

network

DB

DBMS

local schema

DB

DBMS

local schema

DB

DBMS

local schema

network

global schema

query answer

updates

propagate

Page 31: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

31

System structure

Local transaction manager: either local transaction or part of a

global transaction– Maintain a log for recovery– Concurrency control for transactions at the site

Transaction coordinator (not in centralized DBMS)– Start the execution of a transaction– Break a transaction into a number of sub-transactions and

distribute them to the appropriate sites– Coordinate the termination of the transaction

• Commit at all sites• Abort at all sites

Page 32: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

32

Two-Phase Commit protocol (2PC): Phase 1

Transaction T; the transaction coordinator is at C

C

P3

P2

P1

prepare T

log(1) <prepare T>

(2) <ready T> <ready T>

(2) <abort T> <no T>

log

log

commit T

abort T

if all responses

are <ready>

if one of the responses is

<abort>

Page 33: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

33

Two-Phase Commit protocol (2PC): Phase 2

Transaction T; the transaction coordinator is at C

C

P3

P2

P1

prepare T

log(3) <commit T>

(4) <ack T>

<ready T>(4) <ack T>

log

log

commit T

complete T

<commit T>

<commit T>

<ready T>

Similarly for abort

Page 34: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

34

Comments on 2PC

Two rounds of communication: both initiated at the coordinator– Voting– Terminating

Any site can decide to abort a transaction Every message reflects a decision by the sender; The decision is recorded in the log to ensure the decision

survives any failure: transaction id– The log at each participating site: the id of the coordinator – The log at the coordinator: ids of the participating sites

Page 35: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

35

Concurrency control

Single local manager: at a chosen site– Simple implementation and deadlock handling– Bottleneck– Vulnerability: if the site fails

Distributed lock manager: each site has one to update data item D at site j– Send request to the lock manager at site j– Request is either granted or delayed– Deadlock handling is hard

Major complication: replicated data

Page 36: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

36

replication

Synchronous replication: all copies must be updated before the

transaction commits – data distribution transparent, consistent– expensive

Asynchronous replication: copies are periodically updated – Allow modifying transaction to commit before all copies

have been changed – users are aware of data distribution, consistency issues– Cheaper: current products follow this approach– Peer-to-peer replication (master copies)– Primary site replication (only the primary is updateable)

Page 37: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

37

Synchronous replication

Majority approach -- voting: data item D replicated at n sites– A lock request is sent to more than one-half of the sites– D is locked if the majority vote yes, write n/2 + 1 copies– Each copy maintains a version number– Expensive

• 2(n/2 + 1) messages for lock• Read: at least n/2 + 1 copies to make sure it is current• Deadlock is more complicated: if only one copy is

locked

Page 38: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

38

Synchronous replication (cont.)

Majority approach -- voting

Biased protocol: read-any write-all. – Shared lock (read): simply requests a lock on D at one site

that contains a copy of D– Exclusive lock (write): lock on all sites that contain a copy– Less overhead on read, expensive on write– Commonly used approach to synchronous replication

Page 39: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

39

Synchronous replication -- exercise

A distributed system uses the majority (voting) approach to update data replicas. Suppose that a data item D is replicated at 4 different sites: S1, S2, S3, S4. What should be done if a site S wants to

Write D Read D

Page 40: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

40

Synchronous replication -- answer

A distributed system uses majority the (voting) approach to update data replicas. Suppose that a data item D is replicated at 4 different sites: S1, S2, S3, S4. What should be done if a site S wants to

Write D– The site S sends a lock request to any 3 of S1, S2, S3, S4– The write operation is conducted if the lock is granted by the

lock manager of all the 3 sites; otherwise it is delayed until the lock can be granted

Read D– The site S reads copies of D from at least 3 sites– It picks the copy with the highest version number

Page 41: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

41

Asynchronous replication

Primary site: choose exactly one copy, residing at a primary site– A lock request is sent to the primary site– Replicas at other sites may not be updated; they are

secondary to the primary copy – Simple to implement – Main issues:

• D becomes inaccessible if the primary site fails• Propagation of changes from the primary site to others

Page 42: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

42

Asynchronous replication (cont.)

Primary site

Peer-to-peer: more than one of the copies can be a master– Change to a master copy must be propagated to others– Conflicts of changes to two copies have to be resolved– Best used when conflicts do not arise: e.g.,

• Each master site owns a distinct fragment• Updating rights owned by one master at a time

Page 43: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

43

Distributed deadlock detection

Recall wait-for graph– Nodes: transactions– Edges: T1 T2 if T1 requests a resource being held by T2

Local wait-for graph: – Nodes: all transactions local or holding/requesting data item

local to the site– T1 T2 if T1 (at site 1) requests a resource being held by

T2 (at site 2)

Global wait-for graph: union of local wait-for graphs– Deadlock if it contains a cycle

T1

T5

T2 T2

T3 T3

T4

Site 1 Site 2 global

T1

T5

T2

T3

T4

Page 44: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

44

False cycles

Due to communication delay Site 1: local wait-for graph has T1 T2 T2 releases the resource – deletion of the edge Site 2: T2 requests the resource again – addition of T2 T1 Cycle if insert T2 T1 arrives before removal of T1 T2

Centralized deadlock detection – deadlock detection coordinator Constructs/maintains global wait-for graph Detect cycles If it finds a cycle, chose a victim to be rolled backDistributed deadlock manager? More expensive

T1 T2 T1 T2Site 1

T1 T2globalSite 2

Page 45: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

45

Summary and review

Homogeneous vs heterogeneous systems Replication and fragmentation. Pros and cons of replication.

How to reconstruct a fragmented relation (vertical, horizontal)? Simple join (data shipping), semijoin, bloomjoin

set-difference? Intersection? Aggregation? Transaction manager and coordinator. Responsibilities? Describe 2PC. Recovery: coordinator and participating sites Replication:

– majority, read-any write-all, – primary site, peer-to-peer

Local wait-for graph, global wait-for graph, deadlock detection,

deadlock handling

Page 46: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

46

Reading list

MapReduce tutorial:

http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

Take a look at the following:

– Cassandra, http://en.wikipedia.org/wiki/Apache_Cassandra

– Clusterpoint, http://en.wikipedia.org/wiki/Clusterpoint

– Riak, http://en.wikipedia.org/wiki/Riak

Page 47: 1 TDD: Topics in Distributed Databases Distributed Databases Distributed database Distributed query processing: joins and non-join queries Updating distributed.

Reading for the next week

47

1. Pregel: a system for large-scale graph processing

http://kowshik.github.io/JPregel/pregel_paper.pdf

2. Distributed GraphLab: A Framework for Machine Learning in the Cloud, http://vldb.org/pvldb/vol5/p716_yuchenglow_vldb2012.pdf

3. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, http://select.cs.cmu.edu/publications/paperdir/osdi2012-gonzalez-low-gu-bickson-guestrin.pdf

4. GraphChi: Large-Scale Graph Computation on Just a PC, http://select.cs.cmu.edu/publications/paperdir/osdi2012-kyrola-blelloch-guestrin.pdf

5. Performance Guarantees for Distributed Reachability Queries, http://vldb.org/pvldb/vol5/p1304_wenfeifan_vldb2012.pdf

6. W. Fan, X. Wang, and Y. Wu. Distributed Graph Simulation: Impossibility and Possibility. VLDB 2014. (parallel model)

http://homepages.inf.ed.ac.uk/wenfei/papers/vldb14-impossibility.pdfPick two papers and write reviews