1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer...

21
1 Distributed and Parallel Databases

Transcript of 1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer...

1

Distributed and Parallel Databases

2

Distributed Databases

• Distributed Systems goal: – to offer local DB autonomy at geographically

distributed locations

• Multiple CPU's – each has DBMS, but data distributed

• Loosely coupled – homogeneous – heterogeneous - different DBMSs - need

ODBC, standard SQL

3

Advantages of DDBs

• distributed nature of some DB applications (bank branches)

• increased reliability and availability if site failure - also replicate data at > 1 site

• data sharing but also local control

• improved performance - smaller DBs exist at each site

• easier expansion

Slide 25- 4

Client-Server

• Client-Server (b) in figure– Client sends request for service (strict – fixed

roles)– 3-tier architecture

• Presentation tier• Logic tier • Data Tier

5

6

Distributed DBSs (DDBS)

• Distributed DB (c) in figure– WAN– Multiple CPU's – each has DBMS, but data

distributed– lower communication rates– Heterogeneous machines – Homogeneous DDBS

• homogeneous – same DBMSs

– Heterogeneous DDBS• different DBMSs - need ODBC, standard SQL

7

   Heterogeneous distributed DBSsHDDBs

• Data distributed and each site has own DBMS ORACLE at one site, DB2 at another, etc.

• need ODBC, standard SQL • usually transaction manager responsible for

cooperation among sites • must coordinate distributed transaction• need data conversion and to access data at

other sites

P2P

• P2P – Every site can act as server to store part of

DB and as client to request service

8

9

Federated DB - FDBS• federated DB is a multidatabase that is autonomous

(a) in figure• collection of cooperating DBSs that are

heterogeneous • preexisting DBs form new database

• Each DB specifies import/export schema (view)– keeps a partial view of total schema

• Each DB has its own local users, local transparency and DBA– appears centralized for local autonomous users

– appears distributed for global users

10

DDBS

• Issues in DDBS in slides that follow

11

Replication

• Full vs. partial replication

• Which copy to access

• Improves performance for global queries but updates a problem

• Ensure consistency of replicated copies of data

12

Data fragments

• Can distribute a whole relation at a siteor

• Data fragments – logical units of the DB assigned for storage at

various sites – horizontal fragmentation - subset of tuples in

the relation (select) – vertical fragmentation - keeps only certain

attributes of relation (project) need a PK

13

Fragments cont’d

• Horizontal fragments: – disjoint - tuples only member of 1 fragment

        salary < 5000 and dno=4 – complete - set of fragments whose conditions

include every tuple – Complete vertical fragment:

       L1 U L2 U ... Ln - attributes of R                         Li intersect Lj = PK(R)

14

Example replication/fragmentation

• Example of fragments for company DB:     site 1 - company headquarters gets

entire DB     site 2, 3 – horizontal fragments

based on dept. no.

Slide 25- 15

16

Increased complexity

Additional functions needed:

• global vs. local queries

• keep track of data and replication• execution strategies if data at > 1 site

– which copy to access – maintain consistency of copies

17

 To process a query

• Must use data dictionary that includes info on data distribution among servers

• Ensure atomicity• Parse user query

– decomposed into independent site queries– each site query sent to appropriate server site– site processes local query, sends result to result site– result site combines results of subqueries

18

Architectures

• Distributed Systems goal:  to offer local DB autonomy at geographically distributed locations

versus • Parallel Systems goal:  to construct a faster

centralized computer – Improve performance through parallelization– Distribution of data governed by performance– Processing, I/O simultaneously

19

Parallel DBSs

• Shared-memory multiprocessor – get N times as much work with N CPU's access – MIMD, SIMD - equal access to same data, massively

parallel

• Parallel shared nothing– data split among CPUs, each has own CPU, divide

work for transactions, communicate over high speed networks                 LANs - homogeneous machines                 CPU + memory - called a site

Query Parallelism

• Decompose query into parts that can be executed in parallel at several sites– Intra query parallelism

• If shared nothing & horizontally fragmented:Select name, phone from account where age > 65

– Decompose into K different queries– Result site accepts all and puts together (order by, count)

• What if a join and table is fragmented?

20

21

Other issues

• Distributed concurrency control using locking

• New models – Cloud computing