Information Retrieval and Use

1

Information Retrieval and Use

De-normalisation and Distributed database systems

Geoff Leese September 2008, revised October 2009

2

Mapping the logical model onto physical design

Entities become tables More often than not!

Attributes become fields (columns) Unique identifiers become primary keys Relationships implemented by foreign key

columns Resolve M:N relationships by inserting

intersection table

3

Mapping considerations

Independence Privacy Efficiency of queries

4

Denormalisation

Joins take time! Split or merge normalised entities

based on frequent associated useRemove redundant relationshipsMerge entities with 1:1 relationshipsUse summary fieldsUse summary tables and views

5

Using summary field(1) Consider running a query “give the

total value of all orders for customer X”

How many joins?

6

Using summary field (2)

Note summary field in Orders table

How many joins now?

7

Distributed database systems

Special rules apply!

8

The traditional model

One centralised database Terminals at remote locations Disadvantages

Networks are slow (esp WANS!)Central machine does all processing If central machine fails, database is down

(Integrity, redundancy and disaster recovery considered in later lectures!)

9

The Client/Server model

Client – application – “front end” Server – DBMS – “back end” Still dependent on central database

10

Client responsibilities

Manages user interface Accepts user data Has local processing capability within the

application Generates database requests and

transmits them via network to server Receives results from server and formats

them as required by application

11

Server responsibilities

Accepts database requests from client Processes database requests

Handles security issues Deals with concurrency issues Optimizes queries Handles recovery/rollback issues

Returns results to client

12

Distributed database architecture

A collection of logically related “sites”, connected together so that the users view is that of a single database at a single location.

Each site is a database in it’s own right Not necessarily physically or

geographically separated, but often are – and are logically separated.

13

Advantages

Organisations are distributed, why shouldn’t their data be?

Improved efficiencyStore data close to where it’s used

14

Types of DDS

Homogenous – same type of RDBMS at each site (easy!)

Heterogeneous – different types of DBMS at each site (not so easy!)

15

Implementation methods (1) Fragmentation – splitting data

between sitesHorizontal – row based – e.g. store all

employee records for a location at that location

Vertical – column based – e.g. store all payroll columns in payroll department, all other employee data in HR

Either way, fragments must be able to be put back together!

16

Implementation methods (2)

ReplicationControlled duplication of data at more

than one site Update propagation?

17

Objectives (1)

Local autonomy Local data locally owned and managed

– minimal data requirements from remote sites.

No reliance on central site Continuous operation

ReliabilityAvailability

18

Objectives (2)

Location independenceFrom user’s view, all data is at their site.

Fragmentation independenceNeeds joins and unions to put

fragments back together Replication independence

19

Objectives (3)

Distributed query processing Distributed transaction management

Transactions carried out by “agents” at distributed sites

Two-phase commitLocking issues (later lecture)

20

Objectives (4)

Hardware independence Operating system independence Network independence DBMS independence

21

DDS issues

Query processing Optimisation even more important

Catalogue (data dictionary) management Centralised? Fully replicated? Partitioned? Combination of first and third?

22

DDS issues

Update propagationAn issue where replication is used. “Primary copy” system

RecoveryTwo-phase commit

RecoveryLocking strategies

23

Summary

Mapping the logical model Denormalisation Traditional database architecture Client/server model Distributed Database systems

Advantages Objectives Implementation methods Issues

24

Further reading

Rolland chapter 10 Hoffer chapters 12 Denormalisation - click to follow the link!

http://www.sum-it.nl/cursus/dbdesign/english/techn040.htm

Information Retrieval and Use

Documents

Transcript of Information Retrieval and Use