Distributed databases A brief introduction (Figure numbers may not be the same as in the book)...

19
Distributed databases A brief introduction (Figure numbers may not be the same as in the book) 1 Distributed databases

Transcript of Distributed databases A brief introduction (Figure numbers may not be the same as in the book)...

Distributed databases 1

Distributed databases

A brief introduction(Figure numbers may not be the

same as in the book)

Distributed databases 2

Distributed database concepts

• Distributed database (DDB)– Collection of multiple logically interrelated

databases distributed over a computer network• Distributed database management systems

(DDBMS)– Software systems managing a distributed

database, making distribution transparent to the users

Distributed databases 3

Transparency• Hiding implementation details from the users of the database• Data organization transparency

– Location transparency• Use does not depend on location

– Naming transparency• Naming is independent from location

• Replication transparency– Copies can be kept for availability, performance, and availability

• User are unaware of the existence of these copies

• Fragmentation transparency– One table is divided into more locations– Horizontal fragmentation

• Table divided by rows

– Vertical fragmentation• Table divided by columns

Distributed databases 4

Example: Replication and horizontal fragmentation

Distributed databases 5

Reliability and Availability

• Two common advantages of distributed databases

• Reliability– The probability that a system is running at a

certain time point• Availability– The probability that a system is continuously

available during a time interval

Distributed databases 6

Advantages of distributed databases

1. Improved ease and flexibility of application development– Transparency: Developers do not have to know …

2. Increased reliability and availability– Faults are isolated to a single site

3. Improved performance– Data localization, means less network traffic– Parallelism

4. Easier expansion– Easy to add more data, processors, etc.

Distributed databases 7

Types of distributed database systems

• Degree of homogeneity– Homogeneous: All local DBMSs run identical

software– Heterogeneous: Local DMBSs run different software

• Autonomy– Local autonomy: Local site can function as a

standalone DBMS– No autonomy: Local site can not function as a

standalone DBMS

Distributed databases 8

Classification of distributed databases

Distributed databases 9

Database system architectures

Distributed databases 10

General architecture

Distributed databases 11

Component architecture of distributed databases

Distributed databases 12

Data fragmentation• Which site should store which portion of the database?• Simple fragmentation

– Each site has a whole relation• Horizontal fragmentation

– Subset of rows in each site• Sometimes based on location

• Vertical fragmentation– Subset of columns in each site

• Primary key must be in all sites

• Mixed / hybrid fragmentation– Horizontal + vertical fragmentation– Described by fragmentation schema

Distributed databases 13

Example fragmentation

Distributed databases 14

Example fragmentation, continued

Distributed databases 15

Data replication

• Replication to improve availability• Fully replicated database– All data is replicated to each site

• Non replication– All data is stored at exactly one site

• Partial replication– Some data is replicated to some sites– Described by replication schema

Distributed databases 16

Distributed query processing

1. Query mapping– Query mapped from SQL to relational algebra using the

global conceptual schema

2. Localization– Map query on the global schema to separate queries on the

local schemas– Using fragmentation and replication information

3. Global query optimization– Cost = CPU time + I/O time + communication time

4. Local query optimization– Same as in centralized databases

Distributed databases 17

Distributed transaction management, Two-phase commit protocol (2PC)

• Global transaction manager / coordinator– Coordinates the results of local transaction managers.– All local transaction managers must be able to ”commit”, before actually doing

the ”commit”• Two-Phase commit protocol (2PC)

– Phase 1 • Individual databases tell the coordinator that they have finished transaction• All individual databases have finished: Coordinator sends ”prepare for commit” to all

databases• Individual databases answer ”read to commit” or ”cannot commit”

– Phase 2• If all databases answered ”ready to commit”, coordinator sends ”commit” to all

databases• If one (or more) databases answered ”cannot commit”, coordinator sends ”abort” to all

databases.• Timeout: if one (or more) databases does not answer within a given amount of time,

coordinator sends ”abort”.

Distributed databases 18

Two-phase commit protocol (2PC)

• Problems with 2PC– Coordinator crashes: All participating sites are

waiting– No way of knowing whether participating sites

really got the ”commit” / ”abort”

Distributed databases 19

Three-phase commit (3PC)