Distributed DBMSs – Concepts and Design
Chapter 24 in Textbook
Overview
2
Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous.
Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design.
Fragmentation. Replication. Allocation.
DDBMS Transparencies. Date’s 12 Rules for a DDBMS.
Concepts
3
Centralized DBMS systems with a single logical database
located at one site under the control of a single DBMS.
Distributed DBs logically interrelated collection of shared
data physically distributed over a computer network.
Applications can be classified into:
Local applications.
Global applications.
Distributed DBMS
4
Distributed DBMS The software system that:
manages the distributed DBs.
makes distribution transparent to users.
allows users to access data on their own site as well
as remote sites.
Transparent distribution is the fundamental
principle of DDBMS.
Characteristics of DDBMS
5
• A collection of logically related shared data.
• The data is split into a number of fragments.
• Fragments may be replicated.
• Fragments/replicas are allocated to sites.
• The sites are linked by a communications networks.
• The data at each site is under the control of a DBMS.
• The DBMS at each site can handle local applications.
• Each DBMS participates in at least one global application.
Distributed DBMS Topology
6
Site 1
Site 2
Site 3
Site 4
Computer Network
Data itself is distributed and access to it can be local or remote.
Distributed Processing
7
Site 1
Site 2
Site 3
Site 4
Computer Network
Data itself is centralized but access to it can be local or remote.
Homogeneous vs. Heterogeneous DDBMS
8
Homogenous system: all sites use the same DBMS product.
Heterogeneous system: sites may run different DBMS
products & data model.
Possible differences between data in different DBS:
• Data type difference.
• Value difference.
• Semantic difference.
Functions of a DDBMS
9
• Provide access to remote sites and allow transfer of
queries & data among the network’s site.
• Store data distribution details.
• Distributed data processing.
• Security control.
• Concurrency control.
• Recovery services.
Components of a DDBMS
10
Site 1
Site 3
Computer Network
DDBMS
DC LDBMS
DDBMS
DC
GSC
GSC
DB
Global system catalog
Data communication component
Advantages of DDBMS
11
• Reflects organizational structure.
• Improve sharability & local autonomy.
• Improved availability.
• Improved reliability.
• Improved performance.
Disadvantages of DDBMS
12
• Complexity.
• Cost.
• Security.
• Integrity control more difficult.
• Lack of standards.
• Lack of experience.
• DB design more complex.
Distributed Relational DB Design
13
We have a group of tables and we want to distribute them between a group of sites.
Consists of 3 major steps:1. Fragmentation divide a relation into a number of sub-relations (fragments).
(Horizontal & vertical).
2. Replication make a copy of a fragment.
3. Allocation decide where (which site) each of the fragments and replicas are
to be stored.
Distributed Relational DB Design
14
When we fragment, replicate and allocate, we try
to achieve:• Locality of reference.
• Improved reliability and availability.
• Good performance.
• Balanced storage capacities and costs.
• Minimal communication costs.
Rules of Fragmentation
15
Completeness: Nothing (rows or columns) gets lost while we fragment.
Reconstruction: We can get back the original table after we fragmented it.
Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).
Types of Fragmentation
16
Horizontal fragmentation
Vertical fragmentation
Mixedfragmentation
Original PropertyForRent Table
17
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
18
BranchNo
Based on type of property.
P1: Type=‘House’ (PropertyForRent)
P2: Type=‘Flat’ (PropertyForRent)
Horizontal Fragmentation
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
Fragment P1
Fragment P2
Original Staff Table
19
StaffNo Position sex DOB Salary FName LName BranchNo
SL21 Manager M 1 Oct 93 30000 John White B005
SG37 Assistant F 10 Nov 60 12000 Ann Beech B003
SG14 Supervisor M 24 Mar 58 18000 David Ford B003
SG5 Assistant F 3 Jun 40 24000 Susan Brand B007
20
S1: staffno,Position,sex,DOB, Salary(STAFF)
S2: staffno,fname,lname,BranchNo(STAFF)
Vertical Fragmentation
StaffNo Position sex DOB Salary
SL21 Manager M 1 Oct 93 30000
SG37 Assistant F 10 Nov 60 12000
SG14 Supervisor M 24 Mar 58 18000
SG5 Assistant F 3 Jun 40 24000
StaffNo FName LName BranchNo
SL21 John White B005
SG37 Ann Beech B003
SG14 David Ford B003
SG5 Susan Brand B007
Fragment S1 Fragment S2
21
FName LName BranchNoFragment S2.3
StaffNo FName LName BranchNo
Fragment S2.1
StaffNo LName BranchNo
Fragment S2.2S2.1: BranchNo=‘B005’ (S2)
S2.2: BranchNo=‘B003’ (S2)
S2.3: BranchNo=‘B007’ (S2)
S1: staffno,Position,sex,DOB, Salary(STAFF)
S2: staffoo,fname,lname,BranchNo(STAFF)
Fragment S1
Mixed Fragmentation – Vertical then Horizontal
StaffNo FName LName BranchNo
SL21 John White B005
StaffNo FName LName BranchNo
SG37 Ann Beech B003
SG14 David Ford B003
StaffNo FName LName BranchNo
SG5 Susan Brand B007
StaffNo Position sex DOB Salary
SL21 Manager M 1 Oct 93 30000
SG37 Assistant F 10 Nov 60 12000
SG14 Supervisor M 24 Mar 58 18000
SG5 Assistant F 3 Jun 40 24000
Derived Horizontal Fragmentation
22
Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2.
It is not explicitly specified in design but implied from fragmentation of T2.
T1 (child) has a foreign key that belongs to T2 (parent).
Relationship between T1 and T2 either 1-to-1 or Many-to-1.
Use Semi-join operation:
Derived Horizontal Fragmentation
23
You were required by the design to horizontally fragment Staff table. S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) S3: BranchNo=‘B007’ (Staff)
StaffNo Position sex DOB Salary FName LName BranchNo
SL21 Manager M 1 Oct 93 30000 John White B005
SG37 Assistant F 10 Nov 60 12000 Ann Beech B003
SG14 Supervisor M 24 Mar 58 18000 David Ford B003
SG5 Assistant F 3 Jun 40 24000 Susan Brand B007
Derived Horizontal Fragmentation
24
Fragment S1
Fragment S2
Fragment S3
StaffNo Position sex DOB Salary FName LName BranchNo
SG37 Assistant F 10 Nov 60 12000 Ann Beech B003
SG14 Supervisor M 24 Mar 58 18000 David Ford B003
StaffNo Position sex DOB Salary FName LName BranchNo
SL21 Manager M 1 Oct 93 30000 John White B005
StaffNo Position sex DOB Salary FName LName BranchNo
SG5 Assistant F 3 Jun 40 24000 Susan Brand B007
Derived Horizontal Fragmentation
25
After we fragmented Staff, we found out that there is a table related to it, PropertyForRent.
Because Staff is now fragmented, it makes sense to fragment PropertyForRent too.
PropertyForRent
Staffhandle
s1 N
S1: BranchNo=‘B003’ (Staff)
S2: BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si
S3: BranchNo=‘B007’ (Staff)
Original PropertyForRent Table
26
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
27
Derived Horizontal Fragmentation
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
Fragment P1
Fragment P2
Fragment P3
Transparencies in a DDBMS
28
4 main transparencies:1. Distribution Transparency.
a. Fragmnetation.b. Location. c. Replication.d. Local Mapping.e. Naming.
2. Transaction Transparency.3. Performance Transparency.4. DBMS Transparency.
1. Distribution Transparency
29
Allows the user to perceive the DB as a single, logical entity. Types:
a. Fragmentation: the user does not need to know the data is fragmented.
b. Location: the user does not need to know the location of fragments.
c. Replication: the user does not need to know the fragments are replicated.
d. Local Mapping: the user specifies the fragment and its location.
e. Naming: DDBMS makes sure every item name is unique.
Consider the distribution of the STAFF relation: S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) S21: BranchNo=‘B003’ (S2) S22: BranchNo=‘B005’ (S2) S22: BranchNo=‘B007’ (S2)
a. Fragmentation Transparency
30
Highest level of distribution transparency. The user does not need to know that the data is
fragmented. User treats DDB like a centralized DB. The database access are based on the global schema. Fragmentation of the data can be changed without
impacting the user.
Example:
SELECT Fname, Lname
FROM Staff
WHERE position = ‘Manager’;
b. Location Transparency
31
The middle level of distribution transparency.
The user must know that the data is fragmented but still does not need
to know the location of the data.
Data location can be changed without impact on the user.
Example:
SELECT Fname, Lname FROM S21
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S22
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S23
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
c. Replication Transparency
32
User unaware of replication and location but knows that data is fragmented.
On the same level with location transparency.
d. Local Mapping Transparency
33
The lowest level of distribution transparency.
The user knows that the data is fragmented and the location of the data.
Example:
SELECT Fname, Lname FROM S21 AT SITE 3
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S22 AT SITE 5
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S23 AT SITE 7
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
e. Naming Transparency
34
Each item in distributed database must have a unique name.
DDBMS must ensure that no two sites violate that.
Solutions Create a central name server.
Bottleneck. against local autonomy.
Prefix an object with the identifier of the site. loss of distribution transparency.
2. Transaction Transparency
35
All transactions must ensure the consistency and integrity of the DDB.
Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions.
Even if transaction is split, atomicity has to be maintained.
3. Performance Transparency
36
DDBMS performs as if it were a centralized DBMS.
Should not suffer because it is distributed (network communication cost).
When a site issues a query, the system must figure out the fastest way of executing it.
Distributed Query Processor (DQP) must figure out: Which fragment to access. Which copy of fragment to access (if replication is used). Where are the fragments.
3. Performance Transparency
37
Consider the following distributed DB: Property(PropertyNo, city) 10,000 records in London Client(ClientNo, maxPrice) 100,000 records in Glasgow Viewing(PropertNo, ClientNo) 1,000,000 records in London
London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000.
SELECT p.propertyNo
FROM Property P INNER JOIN
(Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo)
ON p.propertyNo = v.propertyNo
WHERE p.city = ‘Aberdeen’ AND
c.maxprice > 200000;
3. Performance Transparency
38
After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query.
Strategies:
1. Move Client table to London and process query there.
2. Move Property and Viewing relation to Glasgow and process query there then return result.
3. Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with maxPrice > 200,000 then return results.
4. Select clients at Glasgow with maxPrice > 200000, move them to London and join with viewing and Aberdeen property.
4. DBMS Transparency
39
Hides the fact that different sites have different local DBMSs.
Heterogeneous DDBMSs.
Date’s 12 Rules for a DDBMS
40
1. Local autonomy.
2. No reliance on a central site.
3. Continuous operation.
4. Location independence.
5. Fragmentation independence.
6. Replication independence.
7. Distributed query processing.
8. Distributed transaction processing.
9. Hardware independence.
10. Operating system independence.
11. Network independence.
12. Database independence.
Top Related