Overview of Distributed Databases Why distributed - Suraj
Transcript of Overview of Distributed Databases Why distributed - Suraj
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
1
Overview of Distributed Databases• Topics
– Concepts – Benefits & Problems– Architectures
• Readings – Lecture Notes– Oszu & Valduriez
Selected sections of chapters 1 & 4
CS544 Module 1© Shazia Sadiq ITEE/UQ
2
Why distributed• Corresponds to the organizational structure of
distributed enterprises– Electronic commerce– Manufacturing control systems
• Divide and Conquer– Work towards a common goal
• Economic Benefits– More computing power– More discipline in smaller but autonomous groups
CS544 Module 1© Shazia Sadiq ITEE/UQ
3
… Why distributedTo solve complex problems by dividing into
smaller pieces, and assigning them to different software groups, which work on different computers
To produce a system which, although runs on multiple processing elements, can work effectively towards a common goal
CS544 Module 1© Shazia Sadiq ITEE/UQ
4
Distributed Database SystemsUnion of two opposed technologies
NetworkTechnology
Distribute
Database Technology
Centralize
CS544 Module 1© Shazia Sadiq ITEE/UQ
5
Concepts• Distributed Computing Systems
– A number of autonomous processing elements, not necessarily homogeneous, that are interconnected by a computer network and that cooperate in performing their assigned tasks
• Distributed Database– A collection of multiple, logically interrelated
databases, distributed over a computer network
• Distributed Database Management System– A software that manages a distributed database, while
making the distribution transparent to the user
CS544 Module 1© Shazia Sadiq ITEE/UQ
6
What is being distributed ?• Processing logic
– Inventory– Personnel– Sales
• Function– Printing– Email
• Data
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
7
MisconceptionsDistributed Computing is not ...
Parallel Computing
Computer network
Backend processor
Multi-processor system
CS544 Module 1© Shazia Sadiq ITEE/UQ
8
Database
Program 1
Program 2
Program 3
Site 1Centralized Database
Centralized Database
CS544 Module 1© Shazia Sadiq ITEE/UQ
9
Database
P r o g r a m 1
P r o g r a m 2
P r o g r a m 3
Site 1
CommunicationsNetwork
Site 2Database2
Site 4 Site 3
Database3
Networked Architecture with Centralized Database
Networked Architecture
CS544 Module 1© Shazia Sadiq ITEE/UQ
10
Database Database
Database
Distributed Database Architecture
Database
P r o g r a m 1
P r o g r a m 2
P r o g r a m 3
Site 1
CommunicationsNetwork
Site 2Database2
Site 4 Site 3
Database3
Distributed database architecture
CS544 Module 1© Shazia Sadiq ITEE/UQ
11
Benefits of DDBs• Transparency• Reliability• Performance• Expansion
CS544 Module 1© Shazia Sadiq ITEE/UQ
12
A Distributed Application
CommunicationsNetwork
PerthSydney
Darwin Brisbane
Sydney DataPerth Data
Sydney DataBrisbane Data
Perth Data
Darwin Data
Transparency• Separation of high-level semantics from low-
level implementation issues• Extension of the Data Independence concept in
centralized databases• Basic Concepts
– Fragmentation– Replication
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
13
Transparency Types in DDBMSs• Network• Replication• Fragmentation
Layers of Transparency
Data
Data Independence
Network Transparency
Replication Transparency
Fragmentation Transparency
Language Transparency
CS544 Module 1© Shazia Sadiq ITEE/UQ
14
Network Transparency• Protect the user from the operational details of
the network. • Users do not have to specify where the data is
located– Location Transparency– Naming Transparency
CS544 Module 1© Shazia Sadiq ITEE/UQ
15
Replication Transparency• Replicas (copies of data) are created for
performance and reliability reasons• Users should be made unaware of the
existence of these copies• Replication causes update problems
CS544 Module 1© Shazia Sadiq ITEE/UQ
16
Fragmentation Transparency• Fragments are parts of a relation
– Vertical Fragments: Subset of Columns– Horizontal Fragment: Subset of tuples
• Fragments are also created for performance and reliability reasons
• Users should be made unaware of the existence of fragments
CS544 Module 1© Shazia Sadiq ITEE/UQ
17
Reliability• Replicated components (data & software)
eliminate single points of failure• Failure of a single site or communication link
will not bring down entire system• Managing the ‘reachable’ data requires
Distributed Transaction Support which provides correctness in the presence of failures
CS544 Module 1© Shazia Sadiq ITEE/UQ
18
Performance• Data Localization
– Reduces contention for CPU and I/O services– Reduces communication overhead
• Parallelism– Inter-query & Intra-query parallelism– The ‘Multiplex’ approach
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
19
Expansion• Expansion by adding processing and storage
power to the network• Replacing a Mainframe vs. Adding more
micro-computers to the network
Go to 28CS544 Module 1© Shazia Sadiq ITEE/UQ
20
Problem Areas• Database Design• Query Processing• Directory Management• Concurrency Control• Deadlock Management• Reliability issues• Operating System Support• Heterogeneous Databases
CS544 Module 1© Shazia Sadiq ITEE/UQ
21
Database Design• How to place database and applications• Fragmentation or Replication
– Disjoint Fragments– Full Replication– Partial Replication
• Design Issues– Fragmentation – Allocation
CS544 Module 1© Shazia Sadiq ITEE/UQ
22
Query Processing• Determining Factors
– Distribution of Data– Communication Costs– Lack of locally available data
• Objective– Maximum utilisation of inherent parallelism
CS544 Module 1© Shazia Sadiq ITEE/UQ
23
Directory Management• Contains information on data descriptions plus
locations• Problems similar to database design
– global or local– centralised or distributed– single copy or multiple copies
CS544 Module 1© Shazia Sadiq ITEE/UQ
24
Concurrency Control• Integrity of the database as in centralised
plus consistency of multiple copies of the database - Mutual Consistency
• Approaches– Pessimistic: Synchronising before– Optimistic: Checking after
• Strategies– Locking– Time stamping
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
25
Deadlock Management• Similar solutions as for centralised
– Prevention– Avoidance– Detection– Recovery
CS544 Module 1© Shazia Sadiq ITEE/UQ
26
Reliability issues• Reliability is an advantage, but does not come
automatically• On failure of a site
– Database at operational sites must remain consistent and up to date
– Database at failed site must recover after failure
CS544 Module 1© Shazia Sadiq ITEE/UQ
27
Operating System Support• Single processor systems
– Memory management– File system and access methods– Crash recovery– Process management
• Distributed environments (in addition to above)– Dealing with multiple layers of network software
CS544 Module 1© Shazia Sadiq ITEE/UQ
28
Heterogeneous Databases• Databases differ in
– Data Model– Data Language
• Translation mechanism for data and programs• Multi database systems
– Building a distributed DBMS from a number of autonomous, centralised databases
CS544 Module 1© Shazia Sadiq ITEE/UQ
29
Architectural Models • Architecture defines
structure– Components– Functionality of
Components– Interactions and
Interrelationships
The ANSI/SPARC Architecture for Centralised DBMS
Internal Schema
Conceptual Schema
External View
External View
...
CS544 Module 1© Shazia Sadiq ITEE/UQ
30
Classification1. Autonomy2. Distribution3. Heterogeneity
Heterogeneity
Distribution
Autonomy
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
31
Autonomy
• Distribution of control (not data)– Design– Communication– Execution
• Alternatives– Tight integration– Semi-autonomous– Total isolation / Fully Autonomous
CS544 Module 1© Shazia Sadiq ITEE/UQ
32
Distribution• Distribution of data• Physical distribution over multiple sites• Alternatives
– Non-distributed / Centralised– Client Server– Peer-to-peer / Fully distributed
CS544 Module 1© Shazia Sadiq ITEE/UQ
33
Heterogeneity• Different database systems• Major differences
– Data models (Relational, Hierarchical)– Data languages (SQL, QBE)– Transaction Management Protocols
CS544 Module 1© Shazia Sadiq ITEE/UQ
34
Architectural Alternatives• Autonomy (A)
– A0: Tight integration– A1: Semi-autonomous– A2: Total isolation
• Distribution (D)– D0: Non-distributed– D1: Client Server– D2: Peer-to-peer
• Heterogeneity (H)– H0: Homogeneous– H1: Heterogeneous
Heterogeneity
Distribution
Autonomy
(A0. D0, H0)
(A2. D2, H1)
3*3*2 = 18 AlternativesSome alternatives are
meaningless or not practical!
CS544 Module 1© Shazia Sadiq ITEE/UQ
35
Architectural Alternatives• (A0, D0, H0)
– A collection of logically integrated DBMSs on the same site, also called Composite Systems
• (A0, D0, H1)– Providing integrated access to heterogeneous
systems on a single machine
CS544 Module 1© Shazia Sadiq ITEE/UQ
36
Architectural Alternatives• (A0, D1, H0)
– Client Server distribution
• (A0, D2, H0)– Fully distributed
• (A1, D0, H0)– Semi-autonomous systems, also called Federated
Systems. Each DBMS knows how to participate in the federation
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
37
Architectural Alternatives• (A1, D0, H1)
– Heterogeneous Federated DBMSs.
• (A1, D1, H1)– Distributed Heterogeneous Federated DBMSs
• (A2, D0, H0)– Multi-database Systems. Complete homogeneity in
component systems is unlikely
CS544 Module 1© Shazia Sadiq ITEE/UQ
38
Architectural Alternatives• (A2, D0, H1)
– Heterogeneous Multi-databases. Similar to (A1, D0, H1), but with full autonomy
• (A2, D1, H1), (A2, D2, H1)– Distributed Multi-database Systems
CS544 Module 1© Shazia Sadiq ITEE/UQ
39
Major DBMS Architectures• (Ax, D1, Hy)
– Client Server Architecture
• (A0, D2, H0)– Peer-to-peer Architecture
• (A2, Dx, Hy)– Multi-database Architecture
CS544 Module 1© Shazia Sadiq ITEE/UQ
40
Client Server Architectures• Distribute the functionality between client
and server to better manage the complexity of the DBMS
• Two-level Architecture
Typical Scenario1.Client parses a query, decomposes into independent site queries, and sends to appropriate server2.Each server processes local query and sends the result relation to client3.Client combines the results of sub -queries Server
Client
User
RequestResponse
CS544 Module 1© Shazia Sadiq ITEE/UQ
41
Peer-to-peer Architecture• Extension of the ANSI/SPARC Architecture to
include Global Conceptual Schema (GCS), where GCS is the union of all LCSs
ES1 ES2 ESn...
GCS
LIS1 LIS2 LISn...
LCS1 LCS2 LCSn...
CS544 Module 1© Shazia Sadiq ITEE/UQ
42
Multi-database Architecture• GCS exists as a
union of some LCSsonly
GES1 GES2 GESn...
GCS
...
...
LIS1
LCS1
LISn
LCSn
LESn1 LESn2 LESn m. . .LES11 LES12 LES1 m. . .
or does not exist at all ES1 ES2 ESn...
LIS1 LIS2 LISn...
LCS1 LCS2 LCSn...
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
43
Next Lecture
Distributed Database Design, Query and Transactions
CS544 Module 1© Shazia Sadiq ITEE/UQ
44
Distributed Database Design, Query and Transactions
• Topics– Distributed Database Design– Distributed DBMS– Distributed Query Processing– Distributed Transaction Management
• Readings– Lecture Notes– Oszu & Valduriez
Selected sections of chapters 5, 7 & 12
The lecture notes for these topics contain many more slides thanwould actually be used for this lecture. These slides have been included for the benefit of the diverse student group
CS544 Module 1© Shazia Sadiq ITEE/UQ
45
Distributed Database Design• Designing the Network • Distribution of the DBMS Software• Distribution of Application Programs• Distribution of Data
– Level of Sharing– Access Pattern– Level of Knowledge
CS544 Module 1© Shazia Sadiq ITEE/UQ
46
Framework of Distribution
Access Pattern
Sharing
Knowledge
Static
Dynamic
Data
Data + Programs
Partial Knowledge
Complete Knowledge
In all cases (except no sharing) the distributed environment hasnew problems, not relevant in the centralized setup
CS544 Module 1© Shazia Sadiq ITEE/UQ
47
Design Strategies
Top Down Strategy
Bottom Up Strategy
Choice of Strategy depends on the Architectural Model
CS544 Module 1© Shazia Sadiq ITEE/UQ
48
ES1 ES2 ESn...
GCS
LIS1 LIS2 LISn...
LCS1 LCS2 LCSn...
ES1 ES2 ESn...
LIS1 LIS2 LISn...
LCS1 LCS2 LCSn...
Taxonomy of Global Information Sharing• Peer-to-peer Architecture (A0,
D2, H0) • Local systems are homogeneous,
and the global system has control over local data and processing
Distributed Databases
• Multi-database Architecture (A2, Dx, Hy )
• Fully autonomous, heterogeneous local systems with no global schema
Multi-Databases
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
49
Top Down DesignRequirements
Analysis
ConceptualDesign
ViewDesign
DistributionDesign
PhysicalDesign
Observation& Monitoring
Objectives
ExternalSchema
AccessInformation
GCS
LCS
PhysicalSchema
ViewIntegration
Feedback
UserInput
GCS
LCS
CS544 Module 1© Shazia Sadiq ITEE/UQ
50
Bottom Up Design• Integration of pre-existing local schemas into a
global conceptual schema• Local systems will most likely be
heterogeneous• Problems of integration and interoperability
GCS
LCS
CS544 Module 1© Shazia Sadiq ITEE/UQ
51
Distributed Design Issues• Fragmentation
– Reasons– Types
– Degree– Correctness
• Allocation– Partitioned– Partially Replicated
– Fully Replicated Computer System Information
CommunicationInformation
ApplicationInformation
DatabaseInformation
CS544 Module 1© Shazia Sadiq ITEE/UQ
52
Why Fragment ?• Application views are subset of relations - a
fragment is a subset of a relation• If there is no fragmentation
– Entire table is at one site - high volume of remote data accesses
– Entire table is replicated - problems in executing updates
• Parallel execution of query, since sub-queries operate on fragments - higher concurrency
CS544 Module 1© Shazia Sadiq ITEE/UQ
53
Drawbacks of Fragmentation• Application views do not require mutually
exclusive fragments - queries on multiple fragments will suffer performance degradation (costly joins and unions)
• Semantic integrity control - checking for dependency of attributes that belong to several fragments may require access to several sites
CS544 Module 1© Shazia Sadiq ITEE/UQ
54
Types of Fragmentation
HybridHorizontal Vertical
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
55
Horizontal FragmentationPERSON
NAME ADDRESS PHONE S A L
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000
Harry Potter 99, Magic St 9976 4321 98000
Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980
PERSON-FRAGMENT1NAME ADDRESS PHONE S A L
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000
PERSON-FRAGMENT2NAME ADDRESS PHONE S A L
Harry Potter 99, Magic St 9976 4321 98000Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980
CS544 Module 1© Shazia Sadiq ITEE/UQ
56
Vertical FragmentationPERSON
NAME ADDRESS PHONE S A L
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000
Harry Potter 99, Magic St 9976 4321 98000
Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980
PERSONNAME ADDRESS PHONE
John Smith 44, Here St 3456 7890
Alex Brown 1, High St 3678 1234
Harry Potter 99, Magic St 9976 4321
Jane Morgan 87, Riverview 8765 1237
Peter Jennings 65, Flag Rd 9851 1238
PERSONNAME SAL
John Smith 34000
Alex Brown 48000
Harry Potter 98000
Jane Morgan 65800Peter Jennings 23980
Primary Key [NAME] is always included
CS544 Module 1© Shazia Sadiq ITEE/UQ
57
Hybrid FragmentationPERSON
NAME ADDRESS PHONE S A L
John Smith 44, Here St 3456 7890 34000
Alex Brown 1, High St 3678 1234 48000
Harry Potter 99, Magic St 9976 4321 98000
Jane Morgan 87, Riverview 8765 1237 65800
Peter Jennings 65, Flag Rd 9851 1238 23980
PERSONNAME ADDRESS PHONE
John Smith 44, Here St 3456 7890
Alex Brown 1, High St 3678 1234
PERSONNAME SAL
Harry Potter 98000
Jane Morgan 65800
Peter Jennings 23980
CS544 Module 1© Shazia Sadiq ITEE/UQ
58
Degree of Fragmentation
No Fragmentation
Individual Tuples
IndividualColumns
Hybrid Fragmentation
CS544 Module 1© Shazia Sadiq ITEE/UQ
59
Correctness of FragmentationRules ensure that the database does not undergo
semantic change during fragmentation– Completeness
Each data item found in a relation R will be found in one or more of R’s fragments R1, R2, … Rn
– ReconstructionThe dependency constraints on the data can be preserved by ensur ing
that every relation R can be reconstructed from its fragments using the operator ∇,such that R = ∇ Ri, ∀Ri∈FR
– DisjointnessA relation R is decomposed into disjoint horizontal fragments, such
that if a data item di∈ Rj, then di∉Rk where j ≠ k. In vertical fragments, disjointness does not apply to primary key
attributes
CS544 Module 1© Shazia Sadiq ITEE/UQ
60
Consider an example: the following global schemaR1=ABCD, R2=DEFG, R3=FGHIJ
A B C D E F G H I J
We propose the following design;
R11=ABCR21=DEFR31=FGH
Site 1
R12=ABDR22=DGR32=FGIJ
Site2
Integrity Constraints Issues
Obviously:R11 R12=R1R21 R22=R2R31 R32=R3
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
61
Let r2(R2), r3(R3) be the following
r2 (D E F G )1 2 3 12 4 3 1
r3 (F G H I J)3 1 0 2 03 2 1 1 1
Integrity Constraints Issues
r21 (D E F)1 2 3 2 4 3
r22 (D G)1 1 2 1
r31 (F G H)3 1 0 3 2 1
r32 (F G I J)3 1 2 03 2 1 1
CS544 Module 1© Shazia Sadiq ITEE/UQ
62
R1=ABCD, R2=DEFG, R3=FGHIJglobal schema
Find all EFH E F H2 3 04 3 0
Integrity Constraints Issues
r2*r3 ( D E F G H I J)1 2 3 1 0 2 02 4 3 1 0 2 0
r2 (D E F G )1 2 3 12 4 3 1
r3 (F G H I J)3 1 0 2 03 2 1 1 1
CS544 Module 1© Shazia Sadiq ITEE/UQ
63
We execute the same query in our DDB;
Find all EFH
E F H2 3 04 3 02 3 14 3 1
R11=ABCR21=DEFR31=FGH
Site 1
R12=ABDR22=DG
R32=FGIJSite2
Integrity Constraints Issues
r21 (D E F)1 2 3 2 4 3
r31 (F G H)3 1 0 3 2 1
CS544 Module 1© Shazia Sadiq ITEE/UQ
64
R1=ABCD, R2=DEFG, R3=FGHIJ
ABC ABD
DEF DG
FGH FGIJ
Often, more constraints must be maintained
Integrity Constraints Issues
CS544 Module 1© Shazia Sadiq ITEE/UQ
65
Versions of HFs• Primary Horizontal Fragmentation
– Selection operation on the owner relation of a database schema
– SELECT * FROM owner WHERE m i
• Derived Horizontal Fragmentation– Defined on the member relation of a link according
to the selection operation specified for its owner– SELECT * FROM member WHERE
member.join -attr IN (SELECT owner.join-attrFROM owner WHERE mj)
mintermpredicate
mintermpredicate
CS544 Module 1© Shazia Sadiq ITEE/UQ
66
Example of Derived HFsEMP
NAME D E P T SAL
John Smith CSEE 34000Alex Brown CHEMISTRY 48000
Harry Potter CHEMISTRY 98000
Jane Morgan CSEE 65800
Peter Jennings CSEE 23980
Melissa Dean ARCHIT 60000
DEPTDEPT BUILDING
CSEE GP-SOUTHCHEMISTRY HAWKEN
ARCHIT HAWKEN
EMPNAME D E P T SAL
Alex Brown CHEMISTRY 48000
Harry Potter CHEMISTRY 98000
Melissa Dean ARCHIT 60000
EMPNAME DEPT SAL
John Smith CSEE 34000
Jane Morgan CSEE 65800
Peter Jennings CSEE 23980
RelationsFragments: Co-located employees
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
67
Approaches for VFs• Vertical fragmentation is more complex than
horizontal fragmentation due to the total number of alternatives available– HF: number of minterm predicates– VF: number of non-primary key attributes
• Heuristic Approaches– Grouping: Bottom Up– Splitting: Top Down
CS544 Module 1© Shazia Sadiq ITEE/UQ
68
Allocation
Performance of read-only queries
Upd
ate P
robl
ems
Partitioned
Fully Replicated
Partially Replica
ted
1
2
3
CS544 Module 1© Shazia Sadiq ITEE/UQ
69
Allocation Problem
Minimal Cost• Cost of storing and querying
a fragment at a site
• Cost of updating a fragment at all sites
• Cost of data communication
Performance• Minimize response time• Maximize system
throughput
Find the optimal distribution of F = {F1, F2, … Fn} to S = {S1, S2, … Sm}, on which a set of applications Q = {q1, q2, … qq} are running
Optimality can be defined through:
CS544 Module 1© Shazia Sadiq ITEE/UQ
70
Guidelines for Distributed Design• Given a database schema, how to decompose it
into fragments and allocate to distributed sites, while minimizing remote data access and update problems– Most active 20% of queries account for 80% of the
total data accesses– Qualitative aspect of the data guides the
fragmentation activity– Quantitative aspect of the data guides the
allocation activity
CS544 Module 1© Shazia Sadiq ITEE/UQ
71
Information Requirements• Fragmentation
– Database Information– Application Information
• Allocation– Communication Network Information– Computer System Information
CS544 Module 1© Shazia Sadiq ITEE/UQ
72
Information requirements for HF• Database Information
– Links between relations (relations in RDB)
• Application Information– Predicates in user queries
• Simple: p1: DEPT = ‘CSEE’, p2 : SAL > 30000
• Conjunctive: minterm predicate: m1= p1∧p2 DEPT=‘CSEE’ ∧ SAL>30000
– Minterm selectivity: Number of tuples returned against a given minterm
– Access Frequency: Access frequency of user queries and/or minterms
Department EmployeesOwner/Source Member/target
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
73
Information requirements for VF• Application Information
– Place in one fragment, attributes usually accessed together.
– Affinity: The bond between two attributes on the basis of how they are accessed by applications.
– Affinity is measured by Usage/Affinity MatrixUsage of an attribute A in query q: Use (q, A) = 1 if used
= 0 if not used
Affinity of two attributes Ai and Aj for a set of queries: Aff (Ai, Aj)
Usage/Affinity Matrix used in Vertical Partitioning Algorithms- Clustering Algorithm- Partitioning Algorithm
CS544 Module 1© Shazia Sadiq ITEE/UQ
74
Information Requirements for Allocation• Database Information
– Fragment Selectivity: The number of tuples of a fragment Fi that need to be accessed in order to process a query qj
– The size of a fragment: Size (Fi) = Cardinality (Fi) * Length-in-bytes (Fi)
• Network Information– Channel capacities, distances, protocol overhead– Communication cost per frame between sites Si
and Sj
CS544 Module 1© Shazia Sadiq ITEE/UQ
75
• Application Information– Read Access: The number of read accesses that a
query qj makes to a fragment Fi during execution– Update Access: The number of update accesses
that a query qj makes to a fragment Fi during execution
• Site InformationKnowledge of storage and processing capacity– The unit cost of storing data at Sk
– Cost of processing one unit of work at Sk
Information Requirements for Allocation
CS544 Module 1© Shazia Sadiq ITEE/UQ
76
Distributed DBMS• Components of Distributed DBMS• Query Processing and Optimization• Transaction Management• Concurrency Control• Reliability Issues
CS544 Module 1© Shazia Sadiq ITEE/UQ
77
DDBMS ComponentsUser
User Interface Handler
Semantic DataController
Global Query Optimizer
Distributed ExecutionMonitor
Local QueryProcessor
Local RecoveryManager
Runtime SupportProcessor
ExternalSchema
Global Conceptual
Schema
Local Conceptual
Schema
LocalInternalSchema
SystemLog
GD/D
USER PROCESSOR
DATA PROCESSOR
CS544 Module 1© Shazia Sadiq ITEE/UQ
78
Distributed Query Processing • The Distributed Query Problem• Layers of Query Processing• Query Decomposition• Data Localization• Query Optimization
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
79
Distributed Query ProcessingDistributed Query Processor
– Map high level query on a distributed database, into a sequence of database operations on fragments
– Database Operations + Communication Operations
Distributed Query Optimizer– Minimization of cost function based
• Computing resources (CPU, I/O)
• Communication networks
CS544 Module 1© Shazia Sadiq ITEE/UQ
80
A Simple Query ProblemEMP (ENO, ENAME, TITLE)ASG (ENO, PNO, RESP, DUR)
Find the names of employees who are managing a project
SELECT ENAMEFROM EMP, ASGWHERE EMP.ENO = ASG.ENO
AND RESP = ‘Manager’;
CS544 Module 1© Shazia Sadiq ITEE/UQ
81
Algebraic Transformations1. ∏ Ename (σ RESP = ‘Manager’ ∧ EMP.ENO = ASG.ENO (EMP X ASG))
2. ∏ Ename (EMP ENO (σ RESP = ‘Manager’ (ASG)))
CentralizedQuery 2 is ‘better’ since it avoids the costly Cartesian ProductDistributedRelational Algebra is not sufficient to express execution strategies
Operations for exchanging dataSelection of sites to process
CS544 Module 1© Shazia Sadiq ITEE/UQ
82
A Distributed Query Problem∏ Ename (EMP ENO (σ RESP = ‘Manager’ (ASG)))
EMP and ASG are fragmented asEMP1 = σ ENO ≥ ‘E3’ (EMP)EMP2 = σ ENO < ‘E3’ (EMP)ASG1 = σ ENO ≥ ‘E3’ (ASG)ASG2 = σ ENO < ‘E3’ (ASG)
ASG1, ASG2, EMP1, EMP2 are stored at sites 1, 2, 3, and 4. The result is expected at site 5 !
CS544 Module 1© Shazia Sadiq ITEE/UQ
83
Equivalent Strategies
Strategy ATotal Cost = 460
Strategy BTotal Cost = 23000
Total cost is determined through tuple accesses and transfers
Site 5
Result = (EMP1 U EMP2) ENO (σ RESP = ‘Manager’ (ASG1 U ASG2)
Site 1 Site 2 Site 4Site 3
ASG1 ASG2 EMP1 EMP2
Result = EMP`1 U EMP`2
EMP`2 = EMP2 ENO (ASG’2)EMP`1 = EMP1 ENO (ASG’1)
ASG’2 = σ RESP = ‘Manager’ ASG2ASG’1 = σ RESP = ‘Manager’ ASG1
Site 1 Site 2
Site 4Site 3
Site 5
EMP`1 EMP`2
ASG`1 ASG`2
CS544 Module 1© Shazia Sadiq ITEE/UQ
84
Layers of Query ProcessingCALCULUS QUERY ON
DISTRIBUTED RELATIONS
LOCAL SITES
CONTROL SITE
ALGEBRAIC QUERY ON DISTRIBUTED RELATIONS
FRAGMENT QUERY
OPTIMIZED FRAGMENT QUERY WITH COMMUNICATION OPERATIONS
OPTIMIZED LOCAL QUERIES
QUERY DECOMPOSITION
DATA LOCALIZATION
GLOBALOPTIMIZATION
LOCAL OPTIMIZATION
GLOBALSCHEMA
FRAGMENTSCHEMA
STATS ON FRAGMENTS
LOCALSCHEMA
Go to 97
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
85
Query DecompositionThe same techniques as in centralized since
information on data distribution is not used• Step 1: Normalization• Step 2: Analysis• Step 3: Elimination of Redundancy• Step 4: Rewriting
CS544 Module 1© Shazia Sadiq ITEE/UQ
86
Step 1: Normalization• Transform arbitrary complex queries into a normal
form• Two Normal Forms
– Conjunctive ( P11∨P12∨…P1n) ∧... ∧ (Pm1∨Pm1∨…Pmn)– Disjunctive ( P 11 ∧ P12 ∧ …P1n) ∨... ∨ (Pm1 ∧ Pm1 ∧ …Pmn)
• Example equivalence rules– P1 ∧ (P2 ∨ P3 ) ⇔(P1 ∧ P2 ) ∨ (P1 ∧ P3 ) – ¬(P1 ∧ P2 ) ⇔ ¬ P1 ∧ ¬ P2 ...
• Example transformationTITLE = ‘Manager’ AND SAL > 30000 AND PROJ = 12 OR PROJ = 34⇔(TITLE = ‘Manager’ ∧ SAL > 30000 ∧ PROJ = 12) ∨ (TITLE = ‘Manager’ ∧ SAL > 30000 ∧ PROJ = 34)
CS544 Module 1© Shazia Sadiq ITEE/UQ
87
Step 2: Analysis• Determine if further processing of the normalized
query is possible.
Type incorrectnessIf attributes or relation names are not defined in the global schemaIf operations are being applied to attributes of the wrong type
Semantic incorrectnessIf components do not contribute to the generation of the resultQuery graphs - disconnected graphs show incorrectness
SELECT ENAME, DURFROM EMP, ASGWHERE EMP.ENO = ASG.ENOAND RESP = ‘ManagerQuery GraphNodes: Result, OperandsEdges: Join, ProjectionDisconnected graph indicates incorrectness
ASG
EMP
RESULT
ENAME
DUR
EMP.ENO = ASG.ENO
RESP = ‘Manager’
CS544 Module 1© Shazia Sadiq ITEE/UQ
88
Step 3: Elimination of RedundancySimplification of the query qualification
(WHERE clause) using well known idempotency rules– P ∧ false ⇔ false– P ∨ P ⇔ P
– P 1 ∧ (P 1 ∨ P 2 ) ⇔ P 1
– P 1 ∧ ¬P ⇔ false
– etc.
• Example redundancySAL > 30000 AND ¬ (SAL ≤ 3000) ⇔ SAL > 30000
CS544 Module 1© Shazia Sadiq ITEE/UQ
89
Step 4: Rewriting• Step 4.1:
Transformation– Transform relational calculus
query into relational algebra using Operator Tree
– By applying Transformation Rulesmany different but equivalent trees may be found
• Step 4.2: Restructuring– Restructuring the tree to
improve performanceSELECT ENAMEFROM EMP, ASGWHERE EMP.ENO = ASG.ENOAND RESP = ‘Manager
Operator Tree
ASGEMP
σ RESP = ‘Manager’
∏ Ename
ENO
CS544 Module 1© Shazia Sadiq ITEE/UQ
90
Data Localization• Localize the query using data distribution
information• Algebraic query on global relations
→ Algebraic query on physical fragments• Two Step Process
– Localization– Reduction
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
91
Data Localization• Step 1: Map distributed query to fragment
query using Localization program EMP1 = σ SAL > 30000 (EMP) and EMP2 = σ SAL ≤ 30000 (EMP)
then localization program for EMP = EMP1∪ EMP2
• Step 2: Simplify and restructure the fragment query using Reduction techniquesBasic Idea is to determine the intermediate results in
the query that produce empty (horizontal) or useless (vertical) relations, and remove the sub-trees that produce them
CS544 Module 1© Shazia Sadiq ITEE/UQ
92
Query Optimization• Query optimization has been independent of
fragment characteristics such as cardinality and replication
• Main objectives of the optimizer (NP-hard !)– Find a strategy close to optimal– Avoid Bad strategies
• Choice of optimal strategy requires prediction of execution costs– Local Processing (CPU, I/O) + Communication
CS544 Module 1© Shazia Sadiq ITEE/UQ
93
Distributed Query Optimization• Global Optimization
– Join Ordering– Semijoin based
Algorithms
• Local Optimization– INGRES Algorithm– System R Algorithm
Most try to optimize the ordering of joins
FRAGMENT QUERY
OPTIMIZED FRAGMENT QUERY WITH
COMMUNICATION OPERATIONS
OPTIMIZED LOCAL QUERIES
GLOBALOPTIMIZATION
LOCAL OPTIMIZATION
STATS ON FRAGMENTS
LOCALSCHEMA
CS544 Module 1© Shazia Sadiq ITEE/UQ
94
Components of the Optimizer• Search Space
– The set of alternative, but equivalent query execution plans– Abstracted by means of operator trees
• Search Strategy– Explores the search space and selects the best plan using
the cost model– Defines how the plans are examined
• Distributed Cost Model– Predicts the cost of a given execution plan
CS544 Module 1© Shazia Sadiq ITEE/UQ
95
Distributed Cost Model• Cost Functions
– Total Time = TCPU * # insts + TI/O * #I/Os + TMSG * #msgs + TTR * #bytes
– Response TimeTCPU * seq-#insts + TI/O * seq-#I/Os + TMSG * seq-#msgs +
TTR * seq-#bytes
• Database Statistics– Estimate the size of the intermediate relations– Based on statistical information: Length of
attributes, distinct values, number of tuples in the fragment ...
CS544 Module 1© Shazia Sadiq ITEE/UQ
96
Cardinalities of intermed. relations• Formulas for estimating the cardinalities of the
results of the basic relational algebra operations – Selection, Projection, Cartesian product, Join,
Semi Join, Union, Difference
• Examples– card (∏ (R)) = card (R)– card (R X S) = card (R) * card (S)
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
97
1 2r s
q
1) site 1 asks site 2 to transmit s and then computes the selectionitself
2) site 1 transmits the selection condition to site 2 and site 2 sends selected records to site 1 ( presumably much smaller then s)
Semijoins• The dominant factor in the time required to process a
query is the time spent on data transfer between sites ( not the computation or retrieval from secondary storage)
• Example – suppose that table r is stored at site 1, and s at 2 to evaluate ( r join σA=a(s)) at site 1
CS544 Module 1© Shazia Sadiq ITEE/UQ
98
Definition: let r(R) and s(S) be two relations. The semijoin of r with s denoted is the relation ∏R (r s), that is, r s is the portion of r that joins with s.
Some important transformations;∏R(r s)= ∏R(r) ∏R (s)=r ∏R (s)
Semijoin
CS544 Module 1© Shazia Sadiq ITEE/UQ
99
r ( A B )@S11 41 51 62 42 63 73 83 9
s ( B C D )@S24 13 164 14 167 13 17
10 14 1610 15 1711 14 1611 15 1812 15 16
(1) We can transfer s to site 1 and compute join at site 1 ( 24 values)
(2) We compute r’= ∏B(r) at site 1, send r’ to site 2, compute s ’=(r’ join s) and send s’ to site 1 (15 values)
(3) We compute (r s) as (r s’)
Thus, r s can be computed knowing only s’
Semijoin Example
r’ ( B )456789
s’ ( B C D )4 13 164 14 167 13 17
CS544 Module 1© Shazia Sadiq ITEE/UQ
100
Distributed Transaction Managmnt• Transaction Management - Revisited• Distributed Transactions• Distributed Execution Monitor• Advanced Transaction Models
CS544 Module 1© Shazia Sadiq ITEE/UQ
101
Centralized transaction processing
• Buffers
Database
Buffer
Database Access Operations
Disk Block containing Xread- item (X)write- i tem(X)
Data Access read-tem (X), write-item (X)
Well-definedRepetitive
High volumeUpdate
Controlled
T1read -item (X);X:=X-N;
write-item (X);read -item (Y);
Y:=Y+N;write-item (Y);
T2
read -item (X);X:=X+M;
write-item (X);Time
Interleaving
CS544 Module 1© Shazia Sadiq ITEE/UQ
102
Transaction States
Active
Failed
Committed
Terminated
Partially Committed
begin-transaction commit
read-item, write-item
abortabort
end-transaction
STATE
Transition
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
103
Desirable propertiesACID properties - enforced by concurrency
control and recovery methods of the DBMS– Atomicity– Consistency– Isolation– Durability
CS544 Module 1© Shazia Sadiq ITEE/UQ
104
Distributed Transactions• Database Consistency
– No semantic integrity constraints are violated
• Transaction Consistency– No incorrect updates during concurrent database
accesses
• Mutual Consistency– No difference in the values of a data item in all the
replicas to which it belongs
CS544 Module 1© Shazia Sadiq ITEE/UQ
105
ExampleConsider the following example:
• A chain of department stores has many individual stores.
• Each store has a database of sales and inventory at that store.
• There may also be a central office database with data about employees, chain-wide inventory, credit card customers, information about suppliers such as outstanding orders etc.
• There is a copy of all the stores’ sales data in a data warehouse, which is used to analyse and predict sales
Source: [MUW2002]
CS544 Module 1© Shazia Sadiq ITEE/UQ
106
… ExampleA manager wants to query all the stores, find the inventory of
toothbrushes and have them moved from store to store in order to balance the inventory
This transaction will have a local component* Ti at the ith store, and a global component T0 at the central office:
1. Component T0 is created at the central office2. T0 sends message to all stores instructing them to create Tis3. Each Ti executes query to determine number of toothbrushes and returns result to
T04. T0 takes these numbers, determines (by some algorithm) the shipmen ts required,
and sends instructions to all affected stores. For example “Store 10 to ship 500 toothbrushes to store 7”
5. Stores receiving these instructions update their inventory and perform the shipment
* A distributed transaction will consist of communicating transaction components, each at a different site
CS544 Module 1© Shazia Sadiq ITEE/UQ
107
… ExampleWhat can go wrong?
A bug in the algorithm causes store 10 to ship more toothbrushes than available in its inventory causing T10 to abort. However, T7 detects no problem and hence updates its inventory in anticipation of the incoming shipment and commits
• Atomicity of T is compromised (T10 never completed)• Database is left in an inconsistent state
After returning the result (step 3) of its inventory count to T0, the machine at store 10 crashes. The instructions to ship are not received from T0.
• What should T10 do when its site recovers?
CS544 Module 1© Shazia Sadiq ITEE/UQ
108
Problems• Managing the commit/abort decision in
distributed transactions? (One component wants to abort while others want to commit)
• Two phase commit
• Assuring serializability of transactions that involve components at several sites?
• Distributed locking
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
109
Commit in Centralized DatabasesAll database access operations of the transaction have
been executed successfully and their effect on the database has been recorded in the log
System log entries:[start -transaction, T];[read-item, T, X];[write-item, T, X, old-value, new-value];[commit, T];
After the commit point, the transaction T is said to be committed
CS544 Module 1© Shazia Sadiq ITEE/UQ
110
Commit in Distributed Databases?Distributed DBMSs may encounter problems not found
in a centralized environment
• Dealing with multiple copies of the data• Failure of an individual site• Failure of communication links
Recording the effect of database access operations in the (local) system log is not a sufficient condition for
distributed commitSource: [MUW2002]
CS544 Module 1© Shazia Sadiq ITEE/UQ
111
A Distributed Execution Monitor• Begin-transaction
Basic housekeeping by TM• Read
If data item is local, then TM retrieves, else selects one copy
• WriteTM coordinates the updating of the data item value at all sites where it resides
• CommitTM coordinates the physical updating of all databases that contain a copy of each data item for which the write was issued
• AbortTM ensures that no effect of the transaction is reflected in the database
TransactionManager
(TM)
Scheduler(SC)
to data processors
With other TMs
With other SCs and
data processors
Results
Begin-Transaction, Read, Write,
Commit, Abort
Distributed Execution Monitor
Go to 124CS544 Module 1© Shazia Sadiq ITEE/UQ
112
Classification of Transactions• Distribution
– Transactions in Centralized DBMSs– Transactions in Distributed DBMSs
• Duration– Short-life transactions– Long-life / long duration transactions
• Processing– On-line / interactive transactions– Batch transactions
• Grouping of Operations– Flat transactions– Nested transactions
CS544 Module 1© Shazia Sadiq ITEE/UQ
113
Selected Adv. Transaction Models• Nested Transactions:
– Grouping of operations into hierarchical structures
• Workflow (long duration) Transactions: – Relaxation of the ACID Properties
CS544 Module 1© Shazia Sadiq ITEE/UQ
114
Nested TransactionsA set of sub-transactions that may
recursively contain other sub-transactions Begin-transaction Reservation
Begin-transaction Airline…End. {Airline}Begin-transaction Hotel…End. {Hotel}Begin-transaction Car…End. {Car}
End
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
115
Types of Nested Transactions• Closed Nested Transaction
– Sub-transaction begins after the root and finishes before– Commit of sub-transaction is conditional upon the commit
of the root– Top-level atomicity
• Open Nested Transactions– Relaxation of top-level atomicity
– Partial results of sub-transactions visible• Sagas• Split transactions
CS544 Module 1© Shazia Sadiq ITEE/UQ
116
Advantages of Nested TM• Higher level of concurrency
– Objects can be released after sub-transaction
• Independent recovery of sub-transactions– Damage is limited to a smaller part, making
it less costly to recover
• Creating new transactions from existing ones
CS544 Module 1© Shazia Sadiq ITEE/UQ
117
Workflow (long duration) Transcts• Generic definition:
– A workflow is an automation of a business process• In the context of transactions:
– Workflow is a (high level) activity, that consists of a set of tasks with a well-defined precedence relationship between them
• Workflow tasks (sub-transactions) are allowed to commit individually (open nesting semantics), permitting partial results to be visible outside the workflow
• Various concepts introduced to overcome problem of dealing with sub-transaction commit – Compensating tasks– Critical tasks– Contingency tasks
CS544 Module 1© Shazia Sadiq ITEE/UQ
118
Sagas: Open and Long-duration Transactional Model
• A collection of actions that form a long duration transaction– A collection of actions– A graph whose nodes are either actions, or one of {Abort, Complete }
called the terminal nodes– An indication of the start called the start node
A0
A5
A1
A4
A2
A3
start
abort
abort
complete
Successful path: A0, A1, A2, A3
Unsuccessful paths: A0, A1, A4A0, A1, A2, A5
Optional for this course
CS544 Module 1© Shazia Sadiq ITEE/UQ
119
Concurrency Control in SagasConcurrency control is managed by two facilities
1. Each action “A”, is itself a (short) transaction• Uses conventional concurrency control such as locking
2. The overall transaction which can be any path from the start node to one of the terminal nodes• Uses the concept of Compensating transactions• A Compensating transaction “rolls back the effect of a committed
action in a way that does not depend on what happened to the database between the time of the action and the time of the compensating transaction
• If A is any action, A -1 is its compensating transaction, and α is any sequence of legal actions and compensating transactions, then A α A-1 ≅ α
CS544 Module 1© Shazia Sadiq ITEE/UQ
120
Distributed Concurrency Control • Distributed Concurrency Control through
Timestamps• Distributed Concurrency Control through
Locking
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
121
Centralized Timestamp Ordering• For each item accessed by conflicting
operations in a schedule, the order of item access does not violate the serializabilityorder.
• Each item has:– read-TS (X) = TS (T)– write -TS (X) = TS (T)where T is the transaction that most recently read
(wrote) X
CS544 Module 1© Shazia Sadiq ITEE/UQ
122
Distributed Timestamp Ordering
TransactionManager
(TM)
Scheduler(SC)
to data processors
With other TMs
With other SCs and
data processors
Results
Begin-Transaction, Read, Write,
Commit, Abort
Distributed Execution Monitor
TM is responsible for assigning a timestamp to each new transaction and attaching it to each database operation passed to the schedulerSC is responsible for keeping track of read write timestamps and performing serializabilitychecks
CS544 Module 1© Shazia Sadiq ITEE/UQ
123
Centralized 2 Phase LockingA transaction follows the two -phase protocol if all locking operations precede the first unlocking operation
Phase 2: Shrinking
read-lock (Y)unlock (X)unlock (Y)
Phase 1: Growing
read-lock (X)write-lock (X)write-lock (Y)
CS544 Module 1© Shazia Sadiq ITEE/UQ
124
Distributed 2 Phase Locking
TransactionManager
(TM)
Lock Manager(LM)
to data processors
With other TMs
With other LMs and
data processors
Results
Begin-Transaction, Read, Write,
Commit, Abort
Distributed Execution Monitor
Differences• TM sends request to
lock managers of ALL participating sites
• Operations are passed to data processors by local as well as REMOTE participating lock managers
CS544 Module 1© Shazia Sadiq ITEE/UQ
125
Distributed Locking
Main difference from Centralized Locking
Replication ControlLogical Elements --- Global Locks
How do we manage locks for objects across many sites?– Centralized
– Primary Copy– Fully Distributed
CS544 Module 1© Shazia Sadiq ITEE/UQ
126
Centralized 2PL• One site is designated as the lock site or primary site
• The primary site is delegated the responsibility for lock management
• Transaction managers at participating sites deal with the primary site lock manager rather than their own
Performance bottleneck at the primary site
Vulnerable to single site failure
DP at Participating Sites Primary Site LMCoordinating TM
1
2
3
45
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
127
Primary Copy 2PL• The function of the primary site is distributed (thus
avoiding bottleneck at the primary site)• Lock managers at a number of sites• Each lock manager responsible for managing locks
for a given set of data elements • The copy of the data element at this site is called the
primary copy, that is each logical element still has a single site responsible for maintaining global lock
• Access to the data element will require access to transaction site as well as the primary copy site
CS544 Module 1© Shazia Sadiq ITEE/UQ
128
Distributed 2PL• Lock managers at each site• If no replication, Distributed 2PL is the same as
Primary copy 2PL• When there is replication, a transaction must acquire
global lock (No copy of a data element is primary)
When can a transaction assume it has a (shared or exclusive) global lock on a data element?
CS544 Module 1© Shazia Sadiq ITEE/UQ
129
Distributed CC Based on Voting• Majority Locking
– If a data item Q is replicated in n different sites, then a lock-request message must be sent to more than one-half of the n sites in which Q is stored.
– The requesting transaction does not operate on Q until it has obtained a lock on a majority of replicas for Q
• Read One, Write All Locking– When a transaction needs to read/shared lock a data item Q,
it requests a lock from the lock manager at one site that contains a replica of Q
– When a transaction needs to write/exclusive lock a data item Q, it requests a lock on Q from the lock manager of all sites that contain a replica of Q
Go to 135CS544 Module 1© Shazia Sadiq ITEE/UQ
130
Deadlocks - RevisitedLocking-based concurrency control may
result in deadlocks!Deadlock: Each transaction in a set (of >2 transactions) is waiting for an item
which has locked another transaction in the set
T1 T2
Wait-for Graph for S
T1read-lock (Y);read-item (Y);
write-lock (X);
T2
read-lock (X);read-item (X);
write-lock (Y);
Partial Schedule S
CS544 Module 1© Shazia Sadiq ITEE/UQ
131
Distributed Deadlock Detection
• Each site maintains a local waits-for graph.• A global deadlock might exist even if the local
graphs contain no cycles:
T1 T2
SITE A
T1 T2
SITE B
T1 T2
GLOBAL
CS544 Module 1© Shazia Sadiq ITEE/UQ
132
Distributed Deadlock DetectionThree Solutions:• Centralized
– One site designated as deadlock detector– Lock managers at other sites periodically send local wait-
for graphs to deadlock detector, that forms a global wait -for graph
– Length of interval is a system design decision
– Approach vulnerable to failure and has high communication overhead
• Hierarchical• Timeout
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
133
Distributed Deadlock DetectionThree Solutions:
• Centralized• Hierarchical
– Sites are organized into a hierarchy of deadlock detectors– Local graphs are sent to parent in the hierarchy– Reduces communication overhead and dependence on central deadlock
detection site
• Timeout
DD21 DD22 DD23 DD24
DD11 DD14
DDox
Site 1 Site 4Site 3Site 2
CS544 Module 1© Shazia Sadiq ITEE/UQ
134
Distributed Deadlock DetectionThree Solutions:• Centralized• Hierarchical• Timeout
– Any transaction that has not completed after an appropriate amount of time is assumed to have gotten deadlocked, and is aborted
CS544 Module 1© Shazia Sadiq ITEE/UQ
135
Distributed Reliability Issues • Failures in Distributed Systems• Distributed Reliability Protocols
CS544 Module 1© Shazia Sadiq ITEE/UQ
136
Failures in Distributed Systems• Transaction Failures• Site (System) Failures• Media Failures• Communication Failures
CS544 Module 1© Shazia Sadiq ITEE/UQ
137
Centralized Recovery• Main Techniques
– Deferred Update– Immediate Update
• Concepts– Disk Caching– System Log– Checkpointing
CS544 Module 1© Shazia Sadiq ITEE/UQ
138
Local Recovery Manager
Stable Log
Stable Database
Database BufferManager
Local RecoveryManager
Log Buffers
DatabaseBuffers
(Volatile Database)
SecondaryStorage
Main Memory
Read, Write
Fetch,Flush
Read, Write
Read, Write
Read, Write
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
139
Distributed Reliability Protocols• Commit Protocols
– Maintain atomicity of distributed transactions.• Atomic Commitment: Ensures that the effects of the transaction on the
distributed database is all-or-nothing, even though the distributed transaction may involve multiple sites
• Recovery Protocols– The procedure a failed site has to go through to recover its sta te after
restart.• Independent Recovery Protocols: Protocols that determine how to
terminate a transaction that was executing at a failed site without having to consult any other site.
• Termination Protocols– The procedure that the operational sites go through when a participating
site fails.• Non-blocking Termination Protocols: Protocols that determine how to
allow the transaction at the operational site to terminate, without waiting for the failed site to recover. CS544 Module 1
© Shazia Sadiq ITEE/UQ140
Commit Protocols1 Phase Commit (1PC)
Ready: All operations of the transaction completed except commit
2 Phase Commit (2PC)Prepare: Global
Commit RuleGlobal abort: If one participant votes to abortGlobal Commit: If all participants vote to commit
Initial
Failed
CommittedReady commit
abort
execute
execute Ready
Failed
CommittedPrepared commit
abort
prepareInitial
CS544 Module 1© Shazia Sadiq ITEE/UQ
141
Centralized 2P Commit Protocols
Phase 1
prepare vote
Coordinator
Participants
Phase 2
global commit commit
CS544 Module 1© Shazia Sadiq ITEE/UQ
142
Two-Phase Commit (2PC)• Site at which Xact originates is coordinator; other sites at which
it executes are subordinates.
• When an Xact wants to commit:¬ Coordinator sends preparemsg to each subordinate. Subordinate force-writes an abort or prepare log record and then sends a
no or yesmsg to coordinator.® If coordinator gets unanimous yes votes, force-writes a commit log record
and sends commitmsg to all subs. Else, force-writes abort log rec, and sends abortmsg.
¯ Subordinates force-write abort/commit log rec based on msg they get, then send ackmsg to coordinator.
° Coordinator writes end log rec after getting all acks.
CS544 Module 1© Shazia Sadiq ITEE/UQ
143
Distributed 2P Commit Protocols
Phase 1
prepare vote
global commitdecision madeindependently
Coordinator
Participants
No Phase 2CS544 Module 1© Shazia Sadiq ITEE/UQ
144
Comments on 2PC• Two rounds of communication: first, voting; then,
termination. Both initiated by coordinator.• Any site can decide to abort an Xact .• Every msg reflects a decision by the sender; to
ensure that this decision survives failures, it is first recorded in the local log.
• All commit protocol log recs for an Xact contain Xactid and Coordinatorid. The coordinator’s abort/commit record also includes ids of all subordinates.
Lecture Notes INFS7907© School of Information Technology and Electrical Engineering
CS544 Module 1© Shazia Sadiq ITEE/UQ
145
Restart After a Failure at a Site• If we have a commit or abort log rec for Xact T, but not
an end rec, must redo/undo T.– If this site is the coordinator for T, keep sending commit/abort
msgs to subs until acks received.
• If we have a prepare log rec for Xact T, but not commit/abort, this site is a subordinate for T.– Repeatedly contact the coordinator to find status of T, then
write commit/abort log rec; redo/undo T; and write end log rec.
• If we don’t have even a prepare log rec for T, unilaterally abort and undo T.– This site may be coordinator! If so, subs may send msgs.
CS544 Module 1© Shazia Sadiq ITEE/UQ
146
Blocking• If coordinator for Xact T fails, subordinates
who have voted yes cannot decide whether to commit or abort T until coordinator recovers.– T is blocked.– Even if all subordinates know each other
(extra overhead in prepare msg) they are blocked unless one of them voted no.
CS544 Module 1© Shazia Sadiq ITEE/UQ
147
Link and Remote Site Failures
• If a remote site does not respond during the commit protocol for Xact T, either because the site failed or the link failed:– If the current site is the coordinator for T, should
abort T.– If the current site is a subordinate, and has not yet
voted yes, it should abort T.– If the current site is a subordinate and has voted yes,
it is blocked until the coordinator responds.
CS544 Module 1© Shazia Sadiq ITEE/UQ
148
2PC with Presumed Abort1. Ack msgs used to let coordinator know when it can “forget” an Xact (T is kept in the
Xact Table until it receives all acks)
2. If coordinator fails after sending prepare msgs but before writing commit/abort log recs, when it comes back up it aborts the Xact.
Meantime, if another site inquires about T, the recovery process will direct it to abort T.
In the absence of information, T is presumed to have aborted– When coordinator aborts T, it undoes T and removes it from the Xact Table immediately,
resulting in a no information state– Similarly, Subordinates do not need to send acks on abort. If it had voted yes previously,
then a later inquiry will result in no information and consequently abort. – Names of subs not recorded in abort log rec.
3. If a subtransaction does no updates, its commit or abort status is irrelevant.– If subXact does not do updates, it responds to prepare msg with reader instead of yes/no.– Coordinator subsequently ignores readers.– If all subXacts are readers, 2nd phase not needed.
CS544 Module 1© Shazia Sadiq ITEE/UQ
149
Commit Protocols• 3 Phase Commit Protocol– avoids the possibility of
blocking in a restricted case of possible failures:no network partition can occur,at any point at least one site must be up,at any point at most K participating sites can fail,
Interesting but optional for this course
CS544 Module 1© Shazia Sadiq ITEE/UQ
150
Distributed Recovery• Two new issues:
– New kinds of failure, e.g., links and remote sites.
– If “sub-transactions” of an Xact execute at different sites, all or none must commit. Need a commit protocolto achieve this.
• A log is maintained at each site, as in a centralized DBMS, and commit protocol actions are additionally logged.