Unit - 4

Unit - 4

Introduction to the Other

Databases

DISTRIBUTEDDATABASE

Introduction :- The Distributed Database System (DDBS) is a

database physically stored on several computer systems across several sites connected together via communication network.

Each site is typically managed by DBMS that is capable of running independently of the other site.

In other words, each site is a database system site in its own right and has its own local users, its own local DBMS, and its own data communication managers.

It site has its own transaction management software, including its own locking, logging and recovery software.

Although geographically dispersed, s distributed database system manages and controls the entire database as a single collection of data.

The location of all data items, and degree of autonomy of individual sites have a significant impact on all aspect of the system, including query optimization and processing, concurrency control and recovery.

In DDBS, both data and transaction processing are divided between one or more computers connected by network, each computer playing a special role in the system.

The computers in the distributed systems communicates with one other via various communication media. They do not share main memory or disk.

A DDBS allows applications to access data from local or remote database.

DDBS use client/server architecture to process information requests. The computer in DDBS are referred to by a number of different names such as sites or nodes.

Distributed database system located at geographically distributed locations because of the need of using the part of database locally then to the remote access.

For example, local branches of a multinational or a national banks or a large company can have their localized databases situated at different branches.

The advancement in communication and networking system triggered the development of distributed database approach.

It became possible to allow these distributed systems to communicate among themselves, so that the data can be effectively access among computer systems in different geographical locations.

As a result the different site machines are quit likely to be heterogeneous, with entirely different individual architecture.

General Distributed Database Architecture

Desired Properties of DDBS :- Distributed database should have the following

properties :- Distributed data independence. Distributed transaction atomicity.

1. Distributed data independence :- This property enables users to ask queries without

specifying where the reference relations or copies or fragments of the relation, are located.

This principle is a natural extension of physical and logical data independence.

Further, queries that span multiple sites should be optimized systematically in a cost-based manner, taking into account communication cost and difference in local communication cost.

2.Distributed Transaction Atomicity :-

This property enables users to write transactions that access and update data at several sites just as they would write transaction over purely local data.

I particularly, the effects of a transaction across sites should continue to be atomic.

That is, all changes persist if the transaction commits, and non persist if aborts.

Types of Distributed Databases :- In distributed database system the data and

software are distributed over multiple sites connected by a communication network.

However DDBS can describe various systems that differ from one another in many respect depending on various factors, such as, degree of homogeneity, degree of local autonomy, and so on.

Following two types of distributed database are most commonly used :- Homogeneous DDBS. Heterogeneous DDBS.

1. Homogeneous DDBS :- This is the simplest form of distributed database

where there are several sites, each running their own applications on the same DBMS software.

All sites have identical software, are aware of one another and agree to cooperate in processing user request.

The application can all see the same schema and run the same transactions.

That is, there is location transparency in homogeneous DDBS. The provision of location transparency from the core of distributed database management system (DDBMS) development.

In homogeneous DDBS, the use of a single DBMS avoids any problem of mismatches database capabilities between nodes, since the data all managed within a single framework.

Homogeneous Distributed Database

2. Heterogeneous DDBS :-

In this DDBS, different sites run under the control of different DBMSs, essentially autonomously and are connected somehow to enable access to data from multiple sites.

Different site may use different schemas and different DBMS software.

The sites may not be aware of one another and they may provide only limited facilities for cooperation in transaction processing.

In other words, in heterogeneous DDBS, each site is an independent and centralized DBMS that has its own local users, local transactions and database administrator (DBA).

Heterogeneous Distributed Database

Advantages of DDBS :- Sharing of data where users at one site may be

able to access the data residing at another sites and at the same time retain control over the data at their own site.

Increase efficiency of processing by keeping the data close to the point where it is more frequently used.

Efficient management of distributed data with different level of transparency.

It enables the structure of the database to mirror the structure of the enterprise in which the local data can be kept locally, while at the same time remote data can be accessed when necessary.

Increased local autonomy where each site is able to retain degree of control over data that are stored locally.

Increase accessibility by allowing to access data between several sites via communication network.

Increase availability in which if one site is fail, the remaining sites may be able to continue operating.

Increase reliability due to greater accessibility. Improved performance. Improved scalability. Easier expansion with the growth of organization in

terms of adding more data, increasing database size and adding more CPUs.

Parallel evaluation by subdividing a query into sub-queries involving data from several sites.

Disadvantages of DDBS :- Recovery of failure is more complex. Increase complexity in the system designing and

implementation. Increase transparency lead to a compromise

between ease of use and the overhead cost of providing transparency.

Increase software development cost. Greater potential for bugs. Increase processing overhead. Technical problem of connecting dissimilar

machines. Difficulty in database integrity control. Security concern of replicate data in multiple

location and the network.

ARCHITECTUREOF

DISTRIBUTED DATABASE

1. CLIENT / SERVER ARCHITECTURE.

2. COLLABORATE ARCHITECTURE.3. MIDDLEWARE ARCHITECTURE.

1. Client / Server Architecture :- Client / Server Architectures are those in which a

DBMS related workload is split into two logical components namely client and server, each of which typically execute on different systems.

Client is the user of the resources where as the server is the provider of the resources.

It has one or more client processors and one or more server processors. The applications and tools are put on client platforms and they are connected to the database management system that reside on the server platform.

The applications and tools act as a client of a DBMS, making request for its services. The DBMSs in tern, serves these requests and return the result to the client(s).

Clients are responsible for user interface issues and servers manage data and execute transactions.

In other words the client/server architecture can be used to implement a DBMS in which the client is the transaction processor (TP) and the server is the data processor (DP).

A client process could run on personal computer and send queries to the server running on a mainframe computer.

All modern information systems are based on client/server architecture.

Client/Server database Architecture

Components of client/server architecture :-

Client in form of workstation as the user’s contact point.

DBMS server as a common resources performing specialized tasks for devices requesting their services.

Communication network connecting the clients and the servers.

Software applications connecting clients, servers and network to create a single logical architecture.

Client applications issues the SQL statements for data access, just as they do in centralized computer environment.

The networking interface enables client applications to connects to the server, and send SQL statements which are created by the clients to the server, and revise the result or error written code to the client, which is send by the server after processing the SQL statement.

Benefits of Client/Server Architecture Relatively simple to implement because of the

centralized server and clean separation of functionalities.

Better adaptability to the computing environment to meet the ever-changing business needs of the organization.

Use of Graphical User Interface (GUI) on microcomputer by the user at client, improve the functionality and simplicity.

It is to less expensive then to mini or mainframe solution.

Expensive server machines are optimally utilized because users are interfering with the inexpensive client machines.

Overall productivity improvement due to decentralized operations.

Improve performance with more processing power.

Limitations of Client/Server ArchitectureThe client/server architecture does not

allow a single query to span multiple servers because the client process would have to be capable to breaking such a query into appropriate sub-queries to be execute at the different sites and then putting together to get the answer to the sub-queries.

An interface in the number of users and processing sites often create security problem.

2. Collaborating Server Systems :-In collaborating server architecture, there

are several database servers, each capable of running transactions against local data, which cooperatively execute transactions spanning multiple servers.

When a server receives a query that requires a access to data at other servers, it generates appropriate sub-queries to be execute by other server and put the result together to compute answers to the original query.

3. Middleware Systems :- The middleware database architecture, also called

data access middleware, is designed to allow a single query to span multiple servers, without requiring all database servers to be capable of managing such multisite execution strategies.

Data access middleware provides users with a consistent interface to multiple DBMSs and file system in transparent manner.

Data access middleware simplifies the heterogeneous environment for programmers and provide users with an easier means of accessing live data in multiple source.

It eliminate the needs for programmers to code many environment specific requests or calls in any applications that need access to current data rather to copies of data.

The direct request or call for data movement to several DBMSs are handle by the middleware, and hence the major rewrite of application program is not required.

The middleware is basically a layer of software, which works as a special server and coordinate the execution of queries and transactions across one or more independent data servers.

The middleware server is capable of executing joins and other relational operations on data obtain from the other servers, but typically does not itself maintain any data.

Middleware might be responsible for routing a local request to one or more servers, transporting the request by supporting various networking protocols, converting data from one format to another.

Middleware System

Data access middleware architecture consists of middleware application programming interface (API), middleware engine, drivers and native interfaces.

API usually consists of a series of available function calls as well as series of data access statements (dynamic SQL, OBE and so on).

The middleware engine is basically an application programming interface for routing of request to various drivers and performing other functions. It handles the data access requests that has been issued.

Drivers are used to connect the various data sources and they translate the request received from the API into the proper format which is understand by targeted data source.

DISTRIBUTED DATABASE SYSTEM (DDBS)

DESIGN

1.) Data Fragmentation :-2.) Data Allocation :-3.) Data Replication :-

DATA FRAGMENTATI

ON

Data Fragmentation :- This is apply to the relational database system to

partition the relations among network sites. Technique of breaking up database into logical unite,

which may be assigned for storage at the various sites is called Data Fragmentation.

In the fragmentation the relation can be partitioned into a several fragments for physical storage purpose and there may be several replaces of each fragment.

These fragments contain sufficient information to allow reconstruction of the original relation.

All fragment of the given relation will be independent.

None of the fragment can be derived from the others.

For example, let us consider a relation EMPLOYEE : Now this relation can be fragment into three

fragments as follows :-

Main Relation :- EMPLOYEEFragments AT SITE Based on

Mumbai_Emp Mumbai Dept_ID = 2Jamsedpur_Emp Jamsedpur Dept_ID = 3London_Emp London Dept_ID = 4

ID NAME DEPT_ID SALARY

E-101 XYZ 3 12,000

E-102 XYZ 4 15,000

E-103 XYZ 2 13,000

E-104 XYZ 3 14,500

E-105 XYZ 4 12,000

E-106 XYZ 2 15,000

The above fragmented relation can be stored at various site as shown in table in which the tuples for Mumbai employees with Dept_ID = 2 are stored at Mumbai site, tuples for Jamsedpur Employees with Dept_ID = 3 are stored at Jamsedpur site, tuples for London Employees with Dept_ID=4 are stored at London site.

In this example the fragmented names are Mumbai_Emp, Jamsedpur_Emp, London_Emp.

Reconstruction of original relation is done via suitable JOIN and UNION operations.

The system that support data fragmentation should also support fragmentation independence also called as fragmentation transparency.

That means the users should not be logically concerned about the fragmentation.

The users should have a fillings as if the data were not fragmented at all.

In other words, fragmentation independence implies that the users will be presented with a view of data in which the fragments are logically recombine by means of suitable JOINs and UNIONs.

It is the responsibility of the system optimizer to determine which fragment need to be physically accessed in order to satisfy any given user request.

Following are the two different schemas for fragmenting a relation : Horizontal Fragmentation :- Vertical Fragmentation :- Mixed Fragmentation :-

Horizontal Fragmentation :- A Horizontal Fragmentation of a relation is a subset

of the tuples with all attributes in that relation. Horizontal fragmentation split the relation

horizontally by assigning each tuple or a group of tuples of a relation to one or more fragments, where each tuple or a subset has a certain logical meaning.

These fragments can be assigned to different sites in the distributed database system.

A horizontal fragmentation is produced by specifying a predicate that performs a restriction on the tuples in the relation.

Relation :- Jamsedpur_Emp


E-101 XYZ 3 12,000

E-104 XYZ 3 14,500

Relation :- London_Emp


E-102 XYZ 4 15,000

E-105 XYZ 4 12,000

Relation :- Mumbai_Emp


E-103 XYZ 2 13,000

E-106 XYZ 2 15,000

σ<condition>(R)

The horizontal fragmentation can be written in terms of relational algebra as :

MUMBAI_EMP : σ Dept_ID = 2 (EMPLOYEE) JAMSEDPUR_EMP : σ Dept_ID = 3 (EMPLOYEE) LONDON_EMP : σ Dept_ID = 4 (EMPLOYEE)

In horizontal fragmentation, UNION operation is done to reconstruct the original relation.

Vertical Fragmentation :- A Vertical Fragmentation split the relation by

decomposing “Vertically” columns (attributes). A vertical fragment of relation keeps only certain

attributes of the relation at the particular site, because each sites may not need all the attributes of the relation.

Thus vertical fragmentation groups together the attributes in the relation that are used jointly by the important transaction

A simple vertical fragmentation is not quit proper when the two fragments are store separately. Since there is no common attribute between the two fragments, we can not put the original EMPLOYEE relation together.

Therefore it is necessary to include a primary attribute or candidate attribute in every vertical fragmentation.

П a1, a2, …an (R) For example :

Fragment EMPLOYEE table…. MUMBAI_EMP : (TID, EMP_ID, EMP_NAME) JAMSEDPUR_EMP : (TID, DEPT_ID) LONDON_EMP : (TID, EMP_SALARY)

MUMBAI_EMP : П TID, EMP_ID, EMP_NAME (EMPLOYEE) JUAMSEDPUR_EMP : П TID, DEPT_ID (EMPLOYEE) LONDON_EMP : П TID, EMPSALARY (EMPLOYEE)

The original relation is obtain by performing JOIN operation.

Relation :- Mumbai_Emp

TID EMP_ID EMP_NAME

T-1 E-10215 XYZ

T-2 E-14587 XYZ

T-3 E-45875 XYZ

T-4 E-87456 XYZ

Relation :- London_Emp

TID EMP_SALARY

T-1 12,000

T-2 15,000

T-3 16,000

T-4 18,000

Relation :- Jamsedpur_Emp

TID DEPT_ID

T-1 2

T-2 3

T-3 2

T-4 3

Mixed Fragmentation :- Sometimes, horizontal or vertical fragmentation of

database schema by itself is insufficient to adequately distribute the data for some applications. For that mixed or hybrid fragmentation is required.

Thus horizontal fragmentation of a relation is followed by further vertical fragmentation or vice versa is called Mixed Fragmentation.

A mixed fragmentation is defined by SELECT or PROJECTION operation of the relation algebra.

П a1, a2, …an (σ<condition>(R))σ<condition>(П a1, a2, …an (R))

The original can be obtain by performing JOIN and UNION operations of relation algebra.

DATA ALLOCATION

Data Allocation :-Data allocation describe the process of

deciding about locating or placing data to several sites.

Following are the data fragment strategies that are used in Distribute Database System : Centralized Partitioned or fragmented Replication

1. Centralized Strategies :

In this strategy entire single database and DBMS is stored at one site. However user are geographically distributed across the network.

The local reference is there for all the sites, except centralize site for all the data access.

Thus the communication costs are high. Because of the entire database is there on

one site, there is a loss of entire database in case of failure of single system.

Hence the reliability and availability are low.

2. Partitioned Strategies :

In this strategy database is divided in to several disjoint parts (fragments) and stored at several sites.

The data item is located at the site where it is used more frequently.

Since there is no replication, the storage cost is low.

The failure of system at particular site will result in the loss of data of that site not entirely. Hence the reliability and availability are high.

The communication cost is low and overall performance is good as compare to the centralized.

3.Replication Strategies :

In this strategy copies of one or more database fragments are stored at several sites.

Thus the locality and of reference, reliability, availability and performance are very high, but the communication cost and storage cost are very high.

DATA REPLICATION

Data replication is the technique that permits storage of certain data in more then one sites.

The system maintains several identical copies of relation and store each copy at a different site.

Data replication is introduce the availability of the system.

If a copy is not available due to failure of system, it should be possible to access another copy.

Data can be replicate as :

REPLICATE LONDON_EMP ASLONDON-MUMBAI_EMP AT SITE ‘Mumbai’

REPLICATE MUMBAI_EMP ASMUMBAI-LONDON_EMP AT SITE ‘London’

Data replication should also support replication independence also known as replication transparency.

That means user should be able to behave as if the data were in fact not replicate at all.

Replication independence simplifies user program and terminal activities.

It is the responsibility of System Optimizer to determine which replicas physically need to be accessed in order to satisfy any given user request.

Advantages of data replication :-

Data replication enhances the performance of read operations by increasing speed at site. That means with data replication, application can operate on local copies instead of having a communication with remote sites.

Data replication increases the availability of data to read-only transactions. That means a given replicated object remains available for processing, at least for retrieval, so long as at least one copy available.

Disadvantages of data replication :-

Increase overhead of update transactions. That means, when a given replicated object is updates all copies of that object must be updated.

More complexity in controlling concurrent updates by several transactions to replicate data.

Unit - 4

Documents

Transcript of Unit - 4