Post on 21-Jul-2016
description
Distributed DatabasesDistributed Databases
Presentation-IPresentation-I
Mr. Gadakh Prashant J.Mr. Gadakh Prashant J.
UNIT-IIUNIT-IIDISTRIBUTED DATABASESDISTRIBUTED DATABASES
•Study of DDBMS architectures,Study of DDBMS architectures,•Comparison of Homogeneous and Heterogeneous Comparison of Homogeneous and Heterogeneous Databases, Databases, •Analysis of Concurrency control in distributed Analysis of Concurrency control in distributed databases,databases,•Implementation of Distributed query processing. Implementation of Distributed query processing. Distributed data storage,Distributed data storage,•Distributed transactions,Distributed transactions,•Commit protocols, Availability, Commit protocols, Availability, •Distributed query processing,Distributed query processing,•Directory systems-LDAP, Directory systems-LDAP, •Distributed data storage and transactions. Distributed data storage and transactions.
Distributed DatabaseDistributed Database
In a distributed database system, the In a distributed database system, the database is stored on several computers.database is stored on several computers.
The computers in distributed system The computers in distributed system communicate with one another through communicate with one another through
various communication media.various communication media.
They do not share main memory or They do not share main memory or disks.disks.
Disadvantage of Parallel DatabaseDisadvantage of Parallel Database Unlike parallel systems,Unlike parallel systems,
• In which the processors are tightly In which the processors are tightly coupled and coupled and
• Constitute a single database system, Constitute a single database system,
INTRODUCTIONINTRODUCTION The computers in a distributed The computers in a distributed
system are referred to by a number system are referred to by a number of different names, such as of different names, such as sites sites or or nodesnodes, depending on the context in , depending on the context in which they are mentioned.which they are mentioned.
We mainly use the term We mainly use the term sitesite, to , to emphasize the physical distribution emphasize the physical distribution of these systems.of these systems.
Objectives Objectives
Sharing dataSharing data
AvailabilityAvailability
Location TransparencyLocation Transparency
An Example of a Distributed An Example of a Distributed DatabaseDatabase
Consider a banking system consisting of four branches Consider a banking system consisting of four branches in four different cities. Each branch has its own in four different cities. Each branch has its own computer, with a database of all the accounts computer, with a database of all the accounts maintained at that branch. Each such installation is maintained at that branch. Each such installation is thus a site. There also exists one single site that thus a site. There also exists one single site that maintains information about all the branches of the maintains information about all the branches of the bank. Each branch maintainsbank. Each branch maintains
(among others) a relation (among others) a relation accountaccount((Account-schemaAccount-schema), ), wherewhere
Account-schema Account-schema = (= (account-numberaccount-number, , branch-namebranch-name, , balancebalance))
Cont..Cont..The site containing information about all The site containing information about all the branches of the bank maintains the the branches of the bank maintains the relation.relation.branchbranch((BranchBranch--schemaschema), where), whereBranch-schema Branch-schema = (= (branch-namebranch-name, , branch-branch-citycity, , assetsassets))There are other relations maintained at There are other relations maintained at the various sites; we ignore them for the the various sites; we ignore them for the purpose of our example.purpose of our example.
Cont..Cont.. To illustrate the difference between the two types of To illustrate the difference between the two types of
transactions—local and global—at the sites, transactions—local and global—at the sites, consider a transaction to add $50 to account number consider a transaction to add $50 to account number
A-177 located at the Valleyview branch. If the A-177 located at the Valleyview branch. If the transaction was initiated at the Valleyview branch, transaction was initiated at the Valleyview branch, then it is considered local; then it is considered local;
otherwise, it is considered global. A transaction to otherwise, it is considered global. A transaction to transfer $50 from account A-177 to account A-305, transfer $50 from account A-177 to account A-305, which is located at the Hillside branch, is a global which is located at the Hillside branch, is a global transaction, since accounts in two different sites are transaction, since accounts in two different sites are accessed as a result of its execution.accessed as a result of its execution.
Types of distributed databaseTypes of distributed database
Homogeneous distributed database SystemHomogeneous distributed database System
Heterogeneous distributed database systemHeterogeneous distributed database system
In a Homogeneous distributed database SystemIn a Homogeneous distributed database System
All the sites have identical database All the sites have identical database management system software.management system software.
Are aware of each other and agree to Are aware of each other and agree to cooperatecooperate
in processing user request.in processing user request.
Appears to user as a single systemAppears to user as a single system
DBMS
SOFTWAREDBMSDBMS SOFTWARESOFTWARE
DBMS
SOFTWARE DBMSDBMS
SOFTWARESOFTWARE
DISTRIBUTEDDISTRIBUTEDDATABASEDATABASE
IdenticalIdentical
DBMSSDBMSS
Homogeneous distributed database SystemHomogeneous distributed database System
In a heterogeneousIn a heterogeneous distributed database Systemdistributed database System
Different sides may use different schemas and Different sides may use different schemas and software.software.
Sites may not be aware of each other and may Sites may not be aware of each other and may provides only limited facilities for cooperationprovides only limited facilities for cooperation
in transaction processing.in transaction processing.
DBMS
SOFTWAREDBMSDBMS SOFTWARESOFTWARE
DBMS
SOFTWARE DBMSDBMS
SOFTWARESOFTWARE
DISTRIBUTEDDISTRIBUTEDDATABASEDATABASE
Non IdenticalNon IdenticalDBMSSDBMSS
HeterogeneousHeterogeneous distributed database Systemdistributed database System
Distributed DBMS ArchitecturesDistributed DBMS Architectures
Client – Server SystemClient – Server System
Collaborating server systemCollaborating server system
Middleware systemMiddleware system
Client - Server SystemClient - Server System
It has one or more client processes & one or It has one or more client processes & one or more server processes.more server processes.
Client are responsible for user interface Client are responsible for user interface issues , an servers manages data and execute issues , an servers manages data and execute transactiontransaction
D/BASE
CLIENT#1
SERVER #2
CLIENT#2
CLIENT#3
D/BASE
SERVER #1
ClientClient#1#1
Collaborating server systemCollaborating server system
In this system we have a collection of In this system we have a collection of database servers.database servers.
When a server receives a query that requires When a server receives a query that requires access to data at other servers, it generates access to data at other servers, it generates appropriate sub queries to be executed by appropriate sub queries to be executed by other servers and puts the results together to other servers and puts the results together to compute answers to the original querycompute answers to the original query
Server
ServerServer
ServerServer
QueryQuery result
Middleware SystemsMiddleware Systems
The idea is that we need just one database server that The idea is that we need just one database server that is capable of managing queries and transactions is capable of managing queries and transactions spanning multiple servers; the remaining servers only spanning multiple servers; the remaining servers only need to handle local queries and transactionsneed to handle local queries and transactions..
special server as a layer of software that coordinates special server as a layer of software that coordinates the execution of queries and transactions across one or the execution of queries and transactions across one or more independent database servers; such a software is more independent database servers; such a software is often called often called middlewaremiddleware
Distributed Data StorageDistributed Data StorageConsider a relation Consider a relation r r that is to be stored in the that is to be stored in the database. There are two approaches to storing this database. There are two approaches to storing this relation in the distributed database:relation in the distributed database:
• • ReplicationReplication. The system maintains several identical . The system maintains several identical replicas (copies) of the relation, and stores each replica replicas (copies) of the relation, and stores each replica at a different site. The alternative to replication is to at a different site. The alternative to replication is to store only one copy of relation store only one copy of relation rr..
• • FragmentationFragmentation. The system partitions the relation . The system partitions the relation into several fragments, and stores each fragment at a into several fragments, and stores each fragment at a different site.different site.
Storing data in distributed systemStoring data in distributed system
FragmentationFragmentation : : It consist of breaking a relation into smaller It consist of breaking a relation into smaller
relations or fragments & storing the fragment relations or fragments & storing the fragment possibly at different sites.possibly at different sites.
1. Horizontal 1. Horizontal fragmentationfragmentation 2 .vertical2 .vertical FragmentationFragmentation
Horizontal Horizontal FragmentationFragmentation
Each fragments consist of a subset of Each fragments consist of a subset of rows of the original relation.rows of the original relation.
Tuples that belong to a given horizontal Tuples that belong to a given horizontal fragment are identified by a selection query.fragment are identified by a selection query.
VerticalVertical FragmentationFragmentation
Each fragments consist of a subset of Each fragments consist of a subset of columns of the original relation.columns of the original relation.
Tuples that belong to a given horizontal Tuples that belong to a given horizontal fragment are identified by a projection query.fragment are identified by a projection query.
68Lucknow819991012Amit
63Ghaziabad819991020Dhruv
61Dehradun819991041Rishi
64Mumbai819991011Amber
60Banaras819991014Anurag
%AddressSemesterRoll NoName
19991012Amit
19991020Dhruv
19991041Rishi
19991046Amber
19991014Anurag
Roll NoName
61Dehradun819991041Rishi
64Mumbai819991011Amber
%AddressSemesterRoll NoName
Vertical fragmentationVertical fragmentationHorizontal fragmentationHorizontal fragmentation
ReplicationReplication
System maintains several identical replicas System maintains several identical replicas (copies) of the relation , and stores each (copies) of the relation , and stores each replica at a different site.replica at a different site.
Advantages
AvailabilityAvailability Faster query evaluationFaster query evaluation
Disadvantages Increased overhead on update
Distributed Query ProcessingDistributed Query Processing In a distributed system, we must take into account In a distributed system, we must take into account
several other matters, including The cost of data several other matters, including The cost of data transmission over the network The potential gain in transmission over the network The potential gain in performance from having several sites process parts of performance from having several sites process parts of the query in parallel.the query in parallel.
The relative cost of data transfer over the network and The relative cost of data transfer over the network and data transfer to and from diskdata transfer to and from disk
varies widely depending on the type of network and on varies widely depending on the type of network and on the speed of the disks. Thus, in general, we cannot the speed of the disks. Thus, in general, we cannot focus solely on disk costs or on network costs. Rather, focus solely on disk costs or on network costs. Rather, we must find a good tradeoff between the two.we must find a good tradeoff between the two.
Cont..Cont..
In Next In Next Presentation,,,Presentation,,,