Distributed Databases. Outline Distributed DBMS (DBMS) Features Detailed insight of its Objectives
Embed Size (px)
Transcript of Distributed Databases. Outline Distributed DBMS (DBMS) Features Detailed insight of its Objectives
OutlineDistributed DBMS (DBMS)Features Detailed insight of its Objectives
Distributed DBMS (1)A distributed DBMS coordinates the data access at various nodes for a distributed database.Although each site will have a DBMS to manage local database at that site, a distributed DBMS will perform the following:
Keep track of data location in a data dictionary.
Determine location from which requested data can be retrieved.
Determine where each part of a distributed query can be processed
Distributed DBMS (2)If required translate request at one node (using a local DBMS) into proper request to another node (using a different DBMS+data model) and return data to requesting node in its acceptable format.
Provide data management functions like security, dead lock control, concurrency, recovery, etc.
Provide consistency among data copies.
Use of global primary keys.
Scalability, replication and transparency.
Permit different DBMS on different nodes.
A Popular DDBMS Architecture
DDBMS Architecture Description (1)The architecture shown on previous slide shows a system in which each site has a local DBMS, a copy of DDBMS and associated DD/D or data repository
The DD/D has data definitions and info about location of distributed data.
User requests for data are first processed by DDBMS which determines whether a local* or global* transaction is involved.
DDBMS Architecture Description(2)For local transactions, DDBMS passes the local requests to local DBMS.
For global transactions, DDBMS routes the request to other sites.
In this architecture, it is assumed that copies of distributed DBMS and DD/D exist at each site.
In another setup, these may be at a central site, but this setup will be vulnerable to failures.
Other strategies are also possible
Key Objectives of DDBMSLocation Transparency Replication Transparency Failure TransparencyCommit protocolConcurrency TransparencyQuery Optimization
Location TransparencyIt implies that User of the data need not to know where the data are physically located.User being unaware of data distribution assumes one single database (physically). Example:User at region A wants to see data of customers whose sales have exceeded $10,000Select * from customer where total_sales>10000Now distributed DBMS at local site (region A) consults the distributed DD/D and determines that this request has to be routed to Region C.On display of results to user, it appears as if data were locally retrieved (unless there is long comm. delay)
Data Distribution ExampleRegionARegionBRegionC
Replication Transparency(1)Although a given data item may be replicated at several nodes in the network, with replication transparency, a user may treat it as if it were a single item at a single node.Also called fragmentation transparency.Example:User wants to view the Price List file.An identical copy of this file exists at all three nodes (three regions) case of full replication.Now distributed DBMS will consult DD/D and determine that this is a local transaction.User need not be aware that same data is stored at other sites.
Replication Transparency(2)In case requested data are not at the local site, the distributed DBMS decides the optimum route to the target site.Consider another situation when one or more users try to update replicated data. user at region C wants to change price of a part.This update must be reflected at all replicated sites for data integrity.With replication transparency this is easily handled.
Failure Transparency (1)Each site in a distributed environment is prone to same type of failures as in centralized systems (like disk failure, erroneous data,etc.)An additional risk is communication link failureA robust system must be able toDetect failureReconfigure system to continue computation Recover when processor or link is repairedThe 1st & 2nd tasks are jobs of comm. controller or processor and 3rd one is job of DDBMSDDBMS has a software module Transaction Manager
Failure Transparency (2)Transaction Manager:Maintain a log of transactions and before & after images of databaseMaintain concurrency control scheme to ensure data integrity during parallel execution of transactions at that siteFor global transactions, transaction managers at each participating site cooperate to ensure all update operations are synchronizedExampleA person at site1 wants to change a parts price in the StandardPriceList (copies are present at three different sites)This global transaction must update every copy of record for that part at the three sitesSuppose, site1 and site2 are successfully updated but due to transmission failure, update does not occur at site3With failure transparency, either all actions of a transaction are committed or none of them are committed
Commit protocolCommit ProtocolEnsures data integrity for distributed update operations, the cooperating transaction managers execute a commit protocolEnsures that a global transaction is successfully completed at all sites or else abortedTwo-phase Commit (an algorithm)Most widely used protocolCoordinates updates in a distributed environmentEnsures that concurrent transactions at multiple sites are processed as though they were executes in same serial order at all sitesHow does it work?
How does the protocol work?(1)Site originating the global transaction sends a request to each of the sites that will process some portion of the transaction.Each site processes sub-transaction but does not commit the result to database (holds in temporary file).Each site locks its portion of database being updated.Each site notifies originating site about completion of sub-transaction.When all sites have responded, originating site initiates two-phase commit protocolPreparation PhaseFinal Commit Phase
How does the protocol work?(2)Preparation Phase
Message is broadcasted to every participating sites, asking whether it is willing to commit its portion of transaction at that site
Each site returns OK or not OK
Originating site collects all messages.
How does the protocol work?(3)Final Commit Phase
If all are OK, it broadcasts a message to all sites to commit the portion of transaction handled at each site
If one or more sites respond with not OK, it broadcasts a message to all sites to abort the transaction
A transaction can fail during the commit phase
Such transaction will be in limbo
How does the protocol work?(4)A limbo transaction can be identified by polling or by a timeout. With a timeout (no confirmation of commit for a specified time period) its not possible to distinguish between a failed or a busy site.
Polling is expensive in terms of network load and processing time
In case a commit confirmation is not received from one or more sites, the originating sites forces all sites to undo changes by abort message
Improvement Strategies for two-phase commitTwo-phase commit protocol is slow due to delays caused by extensive coordination among sites. Some improvements develop ed are:Read-only optimizationThis approach identifies read-only portions of a transaction and eliminates need of confirmation messages on them, e.g., a transaction may include reading of data before inserting new data. (inventory balance check before creating new order, so data read can happen without callback confirmation)Lazy commit optimizationThis approach allows those sites which can update to proceed to update and the ones which can not update are allowed to catch up laterLinear commit optimizationThis approach allows each part of transaction (sub-transaction) to be committed in sequence rather than holding up a whole transaction when sub-transaction parts are delayed from being processed
Concurrency TransparencyConcurrency controlWhen multiple users access and update a database, data integrity may be lost unless locking mechanisms are used to protect data from effects of concurrent updatesConcurrency control is more complex in a distributed database as users are geographically distributed at sitesData are often replicated at several sitesConcurrency TransparencyAllows a distributed system to run many transactionsAllows each transaction to appear as if it were the only activity in the systemWhen several transactions are processed concurrently, results must be same as if each transaction were processed in serial orderTransaction managers at each site must cooperate to provide concurrency control in distributed Database
Concurrency Control Approaches1st approach - LockingData retrieved by a user for updating must be locked or denied to other users until that update is completed or aborted. Transactions can place locks on data resources. Locks are implemented at the following levelsDatabaseTableBlock or PageRecord levelField levelLock TypesShared locks (S lock or read lock)Exclusive locks (X lock or write lock)Details on page 474 (Hoffer - 6/e)
Concurrency Control Approaches2nd approach VersioningUses optimistic approach (opposite to pessimistic locking approach) that most of the time other users do not want the same recordEach transaction is restricted to a view of database as of the time that transaction started and when a transaction modifies a record, the DBMS creates a new record version instead of overwriting the old record. When there is no conflict (only one user transaction changes database records) the changes are merged directly to databaseIf two users make conflicting changes, changes made by one of the users are committed to database usually the earlier time stamped transaction gets pri