1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick...
-
Upload
reynold-campbell -
Category
Documents
-
view
216 -
download
0
description
Transcript of 1 Lecture 10: Distributed Databases – Replication and Fragmentation Advanced Databases CG096 Nick...
1
Lecture 10: Distributed Databases –Replication and Fragmentation
Advanced Databases CG096
Nick Rossiter
2
Overview Last week:
Saw difficulty in handling logical relationships between distributed information
Potential solutions such as federated DDBMS This week:
Look at an area where distributed databases are extensively used replication
For backup for improving reliability of service such as for mirror site
3
Strategies for Data Allocation 1 Centralised
Single database, users distributed across network High communication costs
All data access by users over network No local references
Low reliability and low availability Failure of central site leads to no access to entire database
system Storage costs
No duplication so minimal Performance
Likely to be unsatisfactory
4
Strategies for Data Allocation 2 Fragmented
Database distributed by fragments (disjoint views) Low communication costs
Fragments located near their main users (if good design) Reliability and availability vary depending on failed site
Failure of one part loses fragments situated there Other fragments continue to be available
Storage costs No duplication so minimal
Performance Likely to be satisfactory – better than centralised as less
network traffic
5
Strategies for Data Allocation 3 Complete Replication
Database completely copied to each site Communication costs:
High for update, low for read Need to propagate updates through system
High reliability and high availability Can switch from failed site to another
High Storage costs Complete duplication
Performance High for reads Potentially poor for updates with propagation of updates
6
Strategies for Data Allocation 4 Selective Replication
Fragments are selectively replicated Communication costs:
Low (if good design) Reliability and availability vary depending on failed site
Failure of one part loses fragments situated there Other fragments continue to be available
Storage costs Duplication of some fragments mean that it is not minimal but less
than with complete replication Performance
Likely to be satisfactory – better than centralised as less network traffic
7
Fragmentation -- Further Details A fragment is a view on a table. Two main types
Horizontal (classification by value) subset of tuples obtained by restrict operation
(algebra) or WHERE clause (SQL) Vertical (classification by property)
subset of columns obtained by project operation (algebra) or SELECT clause (SQL)
8
Other Forms of Fragmentation Mixed (classification by both value and property)
both horizontal and vertical fragmentation are used to obtain a single fragment
Derived (association) an expression such as a join connects the fragments
None The whole of a table appears without change in a
view
9
Why fragment? Most applications use only part of the data
in a table To minimise network traffic, do not send
more data than is strictly necessary to any site
Data not required by an application is not visible to it, enhancing security
10
Factors against fragmentation Performance
may be affected adversely by the need for some applications to reconstruct fragments into larger units
Integrity more difficult to control with dependencies
possibly scattered across fragments
11
Three rules for fragmentation R1 R1) Completeness
If a table T is decomposed into fragments every value found in T must be found in at least
one of the fragments Otherwise get loss of data So no loss of data as a whole in fragmentation
12
Three rules for fragmentation R2 R2) Reconstruction
It must be possible to reconstruct T from the fragments using a relational operation (typically a natural join)
Otherwise decomposition into fragments is lossy
Functional dependencies are preserved
13
Three rules for fragmentation R3 R3) Disjointness
A data item may not appear in more than one fragment unless it is a component of a primary key
Avoids duplication and potential inconsistency although transactions should avoid latter
Primary key duplication allows reconstructions to be made
14
Strategy for Designing a Partially Replicated Distributed Database 1 Design global database using standard
methodology Examine regional distribution of business.
What data should be held by each part of business? Some data is only used locally (not exported, as in
Federated DDBMS) Some data is mostly used locally
15
Strategy for Designing a Partially Replicated Distributed Database 2
Transactions give many clues as to ideal placement of fragments a transaction will perform slowly if it requires data
from different sites, unless the network connecting them is very fast
a transaction performing much replication of updates will perform slowly if there is frequent contention for resources (locking)
frequently used transactions should be optimised; infrequently used ones can be ignored
16
Strategy for Designing a Partially Replicated Distributed Database 3 Decide on which relations are not to be
fragmented. These will normally be replicated everywhere: as easy to update and to maintain integrity.
Fragment remaining relations to suit: locality transactions
17
Transparencies in DDBMS Transparency hides details at lower levels
(often implementation ones) from user Four main types:
Distribution Transaction Performance DBMS
18
Distribution Transparency The DDB is perceived by the user as a
single, logical unit even though the data is: distributed over several sites fragmented in various ways
19
Significance of Full Distribution Transparency User does not need to know anything about
the distribution techniques User addresses global schema in queries User will, however, not understand why
some queries take longer than others Highest form of distribution transparency
is termed fragmentation transparency
20
Reduced forms of distribution transparency Location transparency
user needs to know about fragmentation but not about placements at sites
user does not need to know which replications exist
Local mapping transparency the most limited transparency user needs to know about fragmentation and
sites
21
Transaction Transparency Ensures that all transactions maintain the
DDB’s integrity and consistency Each transaction is divided into
subtransactions one subtransaction for each site usually execute subtransactions in parallel gains in efficiency
More complicated than in centralised system
22
Forms of Transaction Transparency Concurrency Transparency
all concurrent transactions (centralised and distributed) execute independently
DDBMS must ensure that: each subtransaction is executed in the normal spirit
of transactions (ACID) the subtransactions as a whole, forming one
transaction, are executed ACID-style the mixture of subtransactions and whole
transactions is executed ACID-style
23
Transactions -- problems with replication Failure Transparency Users are unaware of problems such as that below
encountered during transaction execution If say 6 copies of a data item (at 6 sites) need to
be updated: problems if only 5 are currently reachable need to delay COMMIT until all sites processed otherwise inconsistent data
unless allow delayed asynchronous update
24
Performance transparency Requires:
the DDBMS to determine the most cost-effective way to handle a request which fragment to use (if replicated) which copy of a fragment to use which site to use
avoidance of any performance degradation compared with a centralised system
25
DBMS transparency Hides knowledge of which DBMS is being
used The most difficult transparency of all
particularly with heterogeneous models See problems highlighted in lecture 9:
Global Schema Integration Federated Databases Multidatabase Languages
26
Replication Servers Copying and maintenance of data on
multiple servers Replication -- the process of generating and
reproducing multiple copies of data at one or more sites
Servers – provides the file resources – the distributed database
27
Benefits of Replication Increased reliability Better data availability Potential for better performance (with good
design) Warm stand-by
As in mirror site, shadowing actions of main site and cutting in if main site crashes
28
Timing of Replication Synchronous
Immediate according to some common signal such as time
Ideal as ensures immediate consistency Assumes availability of all sites
Asynchronous Independently with delays ranging from a few
seconds to several days Immediate consistency is not achieved More flexible as at any one time not all sites need to
be available
29
Types of data replicated Across heterogeneous data models
Mapping required (hard) Object replication
More varied than just base data Also auxiliary structures such as indexes Stored procedures and functions
Scalability No volume restrictions
30
Replication administration Subscription mechanism
Allows a permitted user to subscribe to replicated data/objects
Initialisation mechanism Allows for the initialisation of a target
replication
31
Ownership of Replicated Data 1 Master/Slave
Master site Primary owner of replicated data Sole right to change data Publish and subscribe procedure Asynchronous replication as slave sites receive
copies of the data Slave site
Receive read-only data from master site Slaves can be used as mobile clients
32
Ownership of Replicated Data 2 Workflow Ownership
Flexible master designation Dynamic ownership model Right to update data moves along the chain of
command (replicating sites) For example, as order is processed the master
right moves to each department in turn
33
Ownership of Replicated Data 3 Update-anywhere
Peer-to-peer model Multiple sites can update data Conflict resolution required More complex implementation
34
Distribution and Replication in Oracle 9i Materialised views Formerly known as snapshots Views are updated by
Refresh mechanism Variable frequency to suit application
Fast – based on identified changes Complete – replaces existing data Force – tries Fast – if not possible – does
Complete
35
Oracle 9i transparency Does not support
Fragmentation transparency Supports
Site (location) transparency
36
Summary of Distributed DBMS An area under keen development as improves
Availability of data Overall reliability of system Performance (with good design)
However, disadvantages remain: Implementation can be complex (expensive) Heterogeneity in models is poorly handled
Use for replicating data is main application today