1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick...

36
1 Lecture 10: Distributed Databases –Replication and Fragmentation Advanced Databases CG096 Nick Rossiter

description

3 Strategies for Data Allocation 1 Centralised Single database, users distributed across network High communication costs All data access by users over network No local references Low reliability and low availability Failure of central site leads to no access to entire database system Storage costs No duplication so minimal Performance Likely to be unsatisfactory

Transcript of 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick...

Page 1: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

1

Lecture 10: Distributed Databases –Replication and Fragmentation

Advanced Databases CG096

Nick Rossiter

Page 2: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

2

Overview Last week:

Saw difficulty in handling logical relationships between distributed information

Potential solutions such as federated DDBMS This week:

Look at an area where distributed databases are extensively used replication

For backup for improving reliability of service such as for mirror site

Page 3: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

3

Strategies for Data Allocation 1 Centralised

Single database, users distributed across network High communication costs

All data access by users over network No local references

Low reliability and low availability Failure of central site leads to no access to entire database

system Storage costs

No duplication so minimal Performance

Likely to be unsatisfactory

Page 4: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

4

Strategies for Data Allocation 2 Fragmented

Database distributed by fragments (disjoint views) Low communication costs

Fragments located near their main users (if good design) Reliability and availability vary depending on failed site

Failure of one part loses fragments situated there Other fragments continue to be available

Storage costs No duplication so minimal

Performance Likely to be satisfactory – better than centralised as less

network traffic

Page 5: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

5

Strategies for Data Allocation 3 Complete Replication

Database completely copied to each site Communication costs:

High for update, low for read Need to propagate updates through system

High reliability and high availability Can switch from failed site to another

High Storage costs Complete duplication

Performance High for reads Potentially poor for updates with propagation of updates

Page 6: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

6

Strategies for Data Allocation 4 Selective Replication

Fragments are selectively replicated Communication costs:

Low (if good design) Reliability and availability vary depending on failed site

Failure of one part loses fragments situated there Other fragments continue to be available

Storage costs Duplication of some fragments mean that it is not minimal but less

than with complete replication Performance

Likely to be satisfactory – better than centralised as less network traffic

Page 7: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

7

Fragmentation -- Further Details A fragment is a view on a table. Two main types

Horizontal (classification by value) subset of tuples obtained by restrict operation

(algebra) or WHERE clause (SQL) Vertical (classification by property)

subset of columns obtained by project operation (algebra) or SELECT clause (SQL)

Page 8: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

8

Other Forms of Fragmentation Mixed (classification by both value and property)

both horizontal and vertical fragmentation are used to obtain a single fragment

Derived (association) an expression such as a join connects the fragments

None The whole of a table appears without change in a

view

Page 9: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

9

Why fragment? Most applications use only part of the data

in a table To minimise network traffic, do not send

more data than is strictly necessary to any site

Data not required by an application is not visible to it, enhancing security

Page 10: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

10

Factors against fragmentation Performance

may be affected adversely by the need for some applications to reconstruct fragments into larger units

Integrity more difficult to control with dependencies

possibly scattered across fragments

Page 11: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

11

Three rules for fragmentation R1 R1) Completeness

If a table T is decomposed into fragments every value found in T must be found in at least

one of the fragments Otherwise get loss of data So no loss of data as a whole in fragmentation

Page 12: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

12

Three rules for fragmentation R2 R2) Reconstruction

It must be possible to reconstruct T from the fragments using a relational operation (typically a natural join)

Otherwise decomposition into fragments is lossy

Functional dependencies are preserved

Page 13: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

13

Three rules for fragmentation R3 R3) Disjointness

A data item may not appear in more than one fragment unless it is a component of a primary key

Avoids duplication and potential inconsistency although transactions should avoid latter

Primary key duplication allows reconstructions to be made

Page 14: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

14

Strategy for Designing a Partially Replicated Distributed Database 1 Design global database using standard

methodology Examine regional distribution of business.

What data should be held by each part of business? Some data is only used locally (not exported, as in

Federated DDBMS) Some data is mostly used locally

Page 15: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

15

Strategy for Designing a Partially Replicated Distributed Database 2

Transactions give many clues as to ideal placement of fragments a transaction will perform slowly if it requires data

from different sites, unless the network connecting them is very fast

a transaction performing much replication of updates will perform slowly if there is frequent contention for resources (locking)

frequently used transactions should be optimised; infrequently used ones can be ignored

Page 16: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

16

Strategy for Designing a Partially Replicated Distributed Database 3 Decide on which relations are not to be

fragmented. These will normally be replicated everywhere: as easy to update and to maintain integrity.

Fragment remaining relations to suit: locality transactions

Page 17: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

17

Transparencies in DDBMS Transparency hides details at lower levels

(often implementation ones) from user Four main types:

Distribution Transaction Performance DBMS

Page 18: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

18

Distribution Transparency The DDB is perceived by the user as a

single, logical unit even though the data is: distributed over several sites fragmented in various ways

Page 19: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

19

Significance of Full Distribution Transparency User does not need to know anything about

the distribution techniques User addresses global schema in queries User will, however, not understand why

some queries take longer than others Highest form of distribution transparency

is termed fragmentation transparency

Page 20: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

20

Reduced forms of distribution transparency Location transparency

user needs to know about fragmentation but not about placements at sites

user does not need to know which replications exist

Local mapping transparency the most limited transparency user needs to know about fragmentation and

sites

Page 21: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

21

Transaction Transparency Ensures that all transactions maintain the

DDB’s integrity and consistency Each transaction is divided into

subtransactions one subtransaction for each site usually execute subtransactions in parallel gains in efficiency

More complicated than in centralised system

Page 22: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

22

Forms of Transaction Transparency Concurrency Transparency

all concurrent transactions (centralised and distributed) execute independently

DDBMS must ensure that: each subtransaction is executed in the normal spirit

of transactions (ACID) the subtransactions as a whole, forming one

transaction, are executed ACID-style the mixture of subtransactions and whole

transactions is executed ACID-style

Page 23: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

23

Transactions -- problems with replication Failure Transparency Users are unaware of problems such as that below

encountered during transaction execution If say 6 copies of a data item (at 6 sites) need to

be updated: problems if only 5 are currently reachable need to delay COMMIT until all sites processed otherwise inconsistent data

unless allow delayed asynchronous update

Page 24: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

24

Performance transparency Requires:

the DDBMS to determine the most cost-effective way to handle a request which fragment to use (if replicated) which copy of a fragment to use which site to use

avoidance of any performance degradation compared with a centralised system

Page 25: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

25

DBMS transparency Hides knowledge of which DBMS is being

used The most difficult transparency of all

particularly with heterogeneous models See problems highlighted in lecture 9:

Global Schema Integration Federated Databases Multidatabase Languages

Page 26: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

26

Replication Servers Copying and maintenance of data on

multiple servers Replication -- the process of generating and

reproducing multiple copies of data at one or more sites

Servers – provides the file resources – the distributed database

Page 27: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

27

Benefits of Replication Increased reliability Better data availability Potential for better performance (with good

design) Warm stand-by

As in mirror site, shadowing actions of main site and cutting in if main site crashes

Page 28: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

28

Timing of Replication Synchronous

Immediate according to some common signal such as time

Ideal as ensures immediate consistency Assumes availability of all sites

Asynchronous Independently with delays ranging from a few

seconds to several days Immediate consistency is not achieved More flexible as at any one time not all sites need to

be available

Page 29: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

29

Types of data replicated Across heterogeneous data models

Mapping required (hard) Object replication

More varied than just base data Also auxiliary structures such as indexes Stored procedures and functions

Scalability No volume restrictions

Page 30: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

30

Replication administration Subscription mechanism

Allows a permitted user to subscribe to replicated data/objects

Initialisation mechanism Allows for the initialisation of a target

replication

Page 31: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

31

Ownership of Replicated Data 1 Master/Slave

Master site Primary owner of replicated data Sole right to change data Publish and subscribe procedure Asynchronous replication as slave sites receive

copies of the data Slave site

Receive read-only data from master site Slaves can be used as mobile clients

Page 32: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

32

Ownership of Replicated Data 2 Workflow Ownership

Flexible master designation Dynamic ownership model Right to update data moves along the chain of

command (replicating sites) For example, as order is processed the master

right moves to each department in turn

Page 33: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

33

Ownership of Replicated Data 3 Update-anywhere

Peer-to-peer model Multiple sites can update data Conflict resolution required More complex implementation

Page 34: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

34

Distribution and Replication in Oracle 9i Materialised views Formerly known as snapshots Views are updated by

Refresh mechanism Variable frequency to suit application

Fast – based on identified changes Complete – replaces existing data Force – tries Fast – if not possible – does

Complete

Page 35: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

35

Oracle 9i transparency Does not support

Fragmentation transparency Supports

Site (location) transparency

Page 36: 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick Rossiter.

36

Summary of Distributed DBMS An area under keen development as improves

Availability of data Overall reliability of system Performance (with good design)

However, disadvantages remain: Implementation can be complex (expensive) Heterogeneity in models is poorly handled

Use for replicating data is main application today