Information Retrieval and Use
description
Transcript of Information Retrieval and Use
1
Information Retrieval and Use
De-normalisation and Distributed database systems
Geoff Leese September 2008, revised October 2009
2
Mapping the logical model onto physical design
Entities become tables More often than not!
Attributes become fields (columns) Unique identifiers become primary keys Relationships implemented by foreign key
columns Resolve M:N relationships by inserting
intersection table
3
Mapping considerations
Independence Privacy Efficiency of queries
4
Denormalisation
Joins take time! Split or merge normalised entities
based on frequent associated useRemove redundant relationshipsMerge entities with 1:1 relationshipsUse summary fieldsUse summary tables and views
5
Using summary field(1) Consider running a query “give the
total value of all orders for customer X”
How many joins?
6
Using summary field (2)
Note summary field in Orders table
How many joins now?
7
Distributed database systems
Special rules apply!
8
The traditional model
One centralised database Terminals at remote locations Disadvantages
Networks are slow (esp WANS!)Central machine does all processing If central machine fails, database is down
(Integrity, redundancy and disaster recovery considered in later lectures!)
9
The Client/Server model
Client – application – “front end” Server – DBMS – “back end” Still dependent on central database
10
Client responsibilities
Manages user interface Accepts user data Has local processing capability within the
application Generates database requests and
transmits them via network to server Receives results from server and formats
them as required by application
11
Server responsibilities
Accepts database requests from client Processes database requests
Handles security issues Deals with concurrency issues Optimizes queries Handles recovery/rollback issues
Returns results to client
12
Distributed database architecture
A collection of logically related “sites”, connected together so that the users view is that of a single database at a single location.
Each site is a database in it’s own right Not necessarily physically or
geographically separated, but often are – and are logically separated.
13
Advantages
Organisations are distributed, why shouldn’t their data be?
Improved efficiencyStore data close to where it’s used
14
Types of DDS
Homogenous – same type of RDBMS at each site (easy!)
Heterogeneous – different types of DBMS at each site (not so easy!)
15
Implementation methods (1) Fragmentation – splitting data
between sitesHorizontal – row based – e.g. store all
employee records for a location at that location
Vertical – column based – e.g. store all payroll columns in payroll department, all other employee data in HR
Either way, fragments must be able to be put back together!
16
Implementation methods (2)
ReplicationControlled duplication of data at more
than one site Update propagation?
17
Objectives (1)
Local autonomy Local data locally owned and managed
– minimal data requirements from remote sites.
No reliance on central site Continuous operation
ReliabilityAvailability
18
Objectives (2)
Location independenceFrom user’s view, all data is at their site.
Fragmentation independenceNeeds joins and unions to put
fragments back together Replication independence
19
Objectives (3)
Distributed query processing Distributed transaction management
Transactions carried out by “agents” at distributed sites
Two-phase commitLocking issues (later lecture)
20
Objectives (4)
Hardware independence Operating system independence Network independence DBMS independence
21
DDS issues
Query processing Optimisation even more important
Catalogue (data dictionary) management Centralised? Fully replicated? Partitioned? Combination of first and third?
22
DDS issues
Update propagationAn issue where replication is used. “Primary copy” system
RecoveryTwo-phase commit
RecoveryLocking strategies
23
Summary
Mapping the logical model Denormalisation Traditional database architecture Client/server model Distributed Database systems
Advantages Objectives Implementation methods Issues
24
Further reading
Rolland chapter 10 Hoffer chapters 12 Denormalisation - click to follow the link!