Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder...

47
IPD, Forschungsbereich Systeme der Informationsverwaltung Lecture Distributed Data Management Chapter 1: Introduction Erik Buchmann [email protected]

Transcript of Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder...

Page 1: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

IPD, Forschungsbereich Systeme der Informationsverwaltung

Lecture

Distributed Data Management

Chapter 1: Introduction

Erik [email protected]

Page 2: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 2

Structure of this Chapter

● Introduction – are databases and distributed data management not completely different concepts, even contradictory?

● Distribution – what is distributed?● Transparency – classification.● Distributed transactions.● Distributed DBMS.● Query processing.● Outline of this course.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 3: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

IPD, Forschungsbereich Systeme der Informationsverwaltung

Introduction

Page 4: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 4

Situation without Databases (1)

● Access to data stored in files.● Respective functionality is part of applications.

BookFile

ReaderFile

LendingFile

Lending ReminderBookNew

Entrant

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 5: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 5

Situation without Databases (2)

● Redundancy.

BookFile

ReaderFile

LendingFile

Lending

BookFile

BookNew

Entrant

LendingFile

Reminder

ReaderFile

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 6: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 6

Situation without Databases (3)

● Other challenges:– concurrency– transactional guarantees

(atomarity, consistency, isolation, durability)– physical / logical data representation

independence– data privacy, data security– no standard approach for the management of

huge amounts of data

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 7: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 7

From Files to Databases

● Access to data stored in files.● Respective functionality is part of applications.

(Taking physical issues into account, concurrency, access control, consistency).

● Databases factor out this functionality.

BookFile

ReaderFile

LendingFile

Lending ReminderBookNew

Entrant

DBMS

Lending ReminderBook New Entrant

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 8: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 8

(Centralized) Databases

● Data integration, all applications use the same database

● Physical/logical data independence● Efficiency, databases handle large volumes of data● Integrity constraints, consistency even if many

parallel users execute different transactions on the same data

● Declarative query languages, SQL● Automatic query optimization

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

When considering all these benefits, why bothering with distributed data management?

Page 9: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 9

Why Distributed Data Management, After All?

● Data might be distributed to minimize communication costs

● Data might be distributed to equalize the workload among multiple nodes

● Data might be kept at the site of the creator in order to allow cheap updates

● Data might be replicated at multiple sites to improve availability, throughput and response times

● Some scenarios are distributed by nature, e.g., the IT infrastructure of a global company

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 10: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 10

Example: A Large Retailer

● 1 central enterprise resource planning system

● 100 stores● In each store

– 1 manufacturing information system

– 3 points of sale

● Different data, queries, updates on each node

● Different hardware used

Page 11: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 11

Databases vs. Distributed Data Management

● Databases and distributed data management – approaches that seem to exclude each other.– Databases:

Application does not do the data management itself any more, data management is centralized.

– Distributed Data Management:Many different nodes manage different data at different locations.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 12: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 12

Distributed Databases (1)

● Several databases, together with coordination layer.

● Again, one single point of access● Distribution transparent to application

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Coordination Layer

App. 2 App. 3App. 1

DBMS DBMSDBMS

Page 13: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 13

Distributed Databases (2)

● Do we end up with the deficiencies of the situation without databases again?– Distributed DBMS has control over redundant

data. Technology discussed here avoids inconsistencies.

– Generic functionality remains to be factored out.– Distribution is transparent (at least, this is

objective). User/application programmer has the illusion that he deals with a centralized DB.this objective is not always realistic.→ we will learn in the next chapters when

compromises are unavoidable

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 14: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

IPD, Forschungsbereich Systeme der Informationsverwaltung

Distribution

Page 15: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 15

Distributed Systems, Distributed Computing

● Distributed system (of computers):Set of autonomous processors which are connected by a network, and which cooperate in fulfilling the tasks assigned to them.

● What is distributed?– application logic,– function,– data,– control.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 16: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 16

Classification of Distributed Systems (1)

● Degree of couplingratio of volume of data exchanged and extent of local processing; typical situations:– communication via computer network –

weak coupling,– shared components

(main memory, secondary storage) – strong coupling.

● Structure of the connectionalternatives: – point-to-point, – shared connection (bus).

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 17: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 17

Classification of Distributed Systems (2)

● Degree of independence of the components– frequency of information exchange

continuously or only at the beginning and the end (task, result),

● Synchronization of components– synchronous – asynchronous.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 18: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 18

Why Distribution in the First Place? (1)

● Bottom up.● Top Down.Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 19: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 19

Why Distribution in the First Place? (2)

● Bottom up:– Corresponds to structure of organization,– many modern information systems

are distributed ‚by nature‘ (multimedia applications, ERPs, web-based information systems and kiosks),

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 20: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 20

Why Distribution in the First Place? (3)

● Top down:– higher reliability, no ‚single point of failure‘.– Better performance and lower response times,

data locality,– ‚divide&conquer‘ for complex problems

● more computing power becomes available,● software development –

less complex and therefore cheaper.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 21: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 21

Performance

Impact of distribution on performance:● Less CPU- and I/O-contention.● Data Locality

Locality reduces network delays, less communication overhead, for physical reasons, this reduction has natural bounds in WANs.

● Inter- and intra-query parallelism.● Systems that do caching:

Intra-query parallelism may even yield superlinear speedup.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 22: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 22

Read-Only Performance vs. Update Performance

● ‚Mirroring the entire database‘ only works in read-only environments,

● Approaches in database products:– Multiplexing of the database,

i.e., production database and query database.→ better query performance(if query is not part of a transaction that also contains updates)

– Time multiplexing:Batching of updates, only queries; vice versa later on.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 23: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 23

Fragmentation vs. Replication

Example of distributed application.

New York CityLondon

Tokio Hong Kong

NYC Employees, London Employees, NYC Projects

NYC Employees, London Employees, NYC Projects, London Projects

Tokio Employees, Tokio Projects, London Projects

HK Employees, HK Projects

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 24: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 24

Transparency

● Example points out necessity of different kinds of transparency (see following slides):– data independence,– network transparency,– replication transparency,– fragmentation transparency.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 25: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 25

Data Independence

● Data independence: Immunity of application against changes of data organization,

● Type of transparency that also exists in non-distributed context.

● Data Definition:– Schema description,– Physical data description.

● Correspondingly, two kinds of data independence:– Logical data independence

(e.g., attribute is added),– physical data independence

(e.g., index is changed).

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 26: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 26

● Query, e.g., select NAME from PERSON where FIRSTNAME == 'Ralf' works for any representation.

Physical Data Independence – Illustration

NAME FIRSTNAME STREET AGEBöhm Klemens Nordstrasse 28Buchmann Erik Breiter Weg 26Duckstein Ralf Goethestrasse 25Saake Gunter Waldweg 43

FIRSTNAME NAME STREET AGEErik Buchmann Breiter Weg 26Gunter Saake Waldweg 43Klemens Böhm Nordstrasse 28Ralf Duckstein Goethestrasse 25

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 27: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 27

Logical Data Independence – Illustration

● Underlying relation: ESD(Employee, Salary, Department)

● create view highIncomeEmp as select Employee, Salary from ESD where Salary > 20

● View can be used like ‚normal‘ relation, e.g.:– select * from highIncomeEmp where Salary < 50

– insert into highIncomeEmp values ('Klemens', 35000)

● Modifications of base relation might not affect view.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 28: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 28

Network Transparency

● Existence of network (at least certain details on a technical level) should be hidden from application programmer.

● Two complementary variants:– Location transparency:

Command independent of location of the data and the system where command is executed.

– Naming transparency:Each object has unique name.(Otherwise: Application must insert location name as part of the object name.)

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Popular examples of location/naming transparency?

Page 29: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 29

Network Transparency for Flexibility

● Requirements regarding performance become higher→ add new node,

instead of replacement of entire system.● Scale-out.● Analogously, node may want to leave

the (distributed) system.● We could call it number-of-nodes transparency.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 30: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 30

Replication Transparency

● In principle, replication is advantageous:– Higher locality of reference,– reliability and availability.

● Replication transparency:User does not see that replicas exist.(Thus, replication transparency is ‚stronger‘ than network transparency.)

● Important concern of this course:– Show that mechanisms

for replication transparency are very elaborate (in the presence of updates and failures),

– Present mechanisms for replication transparency.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 31: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 31

Fragmentation Transparency● Global query ⇒ fragment queries

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

New York CityLondon

Tokio Hong Kong

NYC Employees, London Employees, NYC Projects

NYC Employees, London Employees, NYC Projects, London Projects

Tokio Employees, Tokio Projects, London Projects

HK Employees, HK Projects

Page 32: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 32

Distributed Transactions (1)

● Important topic of the course: distributed transactions.

● Transaction: Sequence of operations s.t. the system gives certain guarantees for its execution.

● Transaction – transition from one consistent database state to another one.

● Terminology:– Atomicity (failure atomicity)– Consistency– Isolation (concurrency transparency)– Durability

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 33: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 33

Atomicity, Isolation

● Transactional guarantees – in particular, atomicity and isolation.

● Atomicity – Example, „bank scenario“:

– Money transfer – two elementary operations.● debit(Klemens, 500),● credit(Gunter, 500).

● Isolation – can be explained with this example, too

Bank Person Balance

Sparkasse Klemens 5000Deutsche Bank Gunter 200

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Which distribution/replication scenarios are possible?

Page 34: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 34

Distributed Transactions (2)

● Several aspects:– Failure atomicity (cf. previous example).– Distributed transactions for replicated databases

● one objective of distributed data management: higher reliability.

● failure – other parts of the distributed database shall remain accessible, operation goes on.

● Assumption: – site failures and communication failures are

always possible.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 35: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

IPD, Forschungsbereich Systeme der Informationsverwaltung

Distributed DBMS

Page 36: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 36

What is a Distributed DBMS (1)?

● Distributed database = collection of several databases with logical relationships between them, distributed over a network of computers.

● Distributed DBMS = software administering the distributed database and hiding distribution from the users.

→ we are not talking about simple „shared resources“ scenarios

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 37: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 37

What is a Distributed DBMS (2)?

Same database problems as before.

One centralized database in a network

No message exchange necessary.

DBMS on multi-processor

Network is not the only common component.

Several databases on the same machine

no common structure among the files; no common interface for applications

Collection of files on different nodes of a network

Reason:Not a distributed DB:Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 38: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 38

Federated vs. Distributed DBMS

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

● Federated DBMS– a number of loosely coupled, indepentent DBMS – different database schemas, query languages,

transaction models, programming interfaces → wrapper needed, virtual integration

● (Integrated) Distributed DBMS– tightly coupled DBMS with logical relationships

between them– uniform view on all resources

→ no wrapper, physical integration

Page 39: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 39

Advantages of Distributed DBMS

Advantages, as compared to centralized DBMS.● local autonomy

local control – degree of autonomy not as high as with federated DBMSs

● higher performancesmaller number of transactions in local DBMS,but: sometimes data from different nodes.

● higher reliability and availabilityfailures do not affect the entire system

● extensibilitynew nodes

● cost effectivenesssmaller computers, sharing of resources

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 40: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 40

Disadvantages (1)

● Design of a distributed DBMS is more complex than the one of a centralized DBMS.– Replication of data objects

● Choice of copy to be read,● Guaranteeing that update

takes place on each copy.– Site- and/or communication failure

Effects of updates must go to all nodes in time.– Synchronization of transactions

more difficult in presence of several sites, as compared to centralized case.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 41: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 41

Disadvantages (2)

● Complexityall intricacies known from centralized DBMS plus additional problems, see rest of this course.

● Costsnot only hardware; but also software, communication, staff

● Control is decentralizedcoordination?

● Data securitymore components that require protection, and network in addition.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 42: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 42

Query Processing (1)

● Query Processing – not the same as read operations.● What differences in the distributed case?

– Replication, fragmentation etc.→ query processing must be extended

– Such extensions● are either straightforward,● are ‚interesting‘, e.g., semi-joins,

but impact tends to be low.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

rA

rC,r

D

rA,r

BrC,r

D

Page 43: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 43

Query Processing (2)

● Slightly different perspective: Distributed sources are around, e.g., in enterprises.– Queries over several/many of such sources.– Solutions: Federated database systems,

mediator-based architectures.● Scenario behind more recent developments:

– Sources are highly distributed, e.g., WWW,– queries over several/many such sources.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

rA

rC,r

D

rA,r

BrC,r

D

Page 44: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 44

Query Processing (3)

● Which changes?– Availability of sources not clear, may change at

any time.– In general, semantics of data rather unclear to

users.– Cost issues – frequencies of data values are

less obvious; construction of histograms not feasible in general.

● Topics in this course: Continuous QP, Online QP.

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

rA

rC,r

D

rA,r

BrC,r

D

Page 45: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

IPD, Forschungsbereich Systeme der Informationsverwaltung

Structure of the Course

Page 46: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 46

Structure of this Course (1)● Introduction

motivation, architecture of distributed DBMSs, brief intro to concurrency control,

● Distributed transactions atomic commit protocols, 2PC vs. 3PC; optimizations, e.g., presumed abort; transactions in federated databases, tickets,

● Replication transactional guarantees in distributed systems with replication, replication in presence of site failures and communication failures, lazy replication, epidemic protocols for update propagation,

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course

Page 47: Distributed Data Management - KITdbis.ipd.kit.edu/download/01-intro.pdf · DBMS Lending Reminder Book New Entrant Introduction Distribution Trans-parency Distrib. TA Distributed DBMS

Erik Buchmann DDM: Introduction – 47

Structure of this Course (2)● Caching

semantic caching, cache consistency and cache coherency, distributed caching, prefetching,

● Query processing in distributed environments (with a focus on open, unreliable settings such as the Internet) continuous query processing, first-few queries and user-adaptive query evaluation.

● Data management in sensor networksquery processing in unreliable scenarios where many independent nodes process sensor data; optimization of communication processes

Introduction

Distribution

Trans-parency

Distrib. TA

DistributedDBMS

Query Proc.

Structureof course