Storage and File Organization. 11.2Database System Concepts General Overview - rel. model ...

40
Storage and File Organization Storage and File Organization
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Storage and File Organization. 11.2Database System Concepts General Overview - rel. model ...

Page 1: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

Storage and File OrganizationStorage and File Organization

Page 2: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.2Database System Concepts

General Overview - rel. modelGeneral Overview - rel. model

Relational model - SQL

Formal & commercial query languages

Functional Dependencies

Normalization

Physical Design

Indexing

Page 3: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.3Database System Concepts

ReviewReview

DBMSs store data on disk

Disk characteristics: 2 orders of magnitude slower than MM

Unit of read/write operations a block/page (multiple of sectors)

Access time = seek time + rotation time + transfer time

Sequential I/O much faster than random I/O

Database Systems try to minimize the overhead of moving data from and to disk

Page 4: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.4Database System Concepts

ReviewReview

Methods to improve MM to/from disk transfers

1. Improve the disk technology

a. Use faster disks (more RPMs)

b. Parallelization (RAID) + some redundancy

2. Avoid unnecessary reads from disk

a. Buffer management: go to buffer (MM) instead of disk

b. Good file organization

3. Other (OS based improvements): disk scheduling (elevator algorithm, batch writes, etc)

Page 5: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.5Database System Concepts

Buffer ManagementBuffer Management

Keep pages in a part of memory (buffer), read directly from there

What happens if you need to bring a new page into buffer and buffer is full: you have to evict one page

Replacement policy: LRU : Least Recently Used (CLOCK)

MRU: Most Recently Used

Toss-immediate : remove a page if you know that you will not need it again

Pinning (needed in recovery, index based processing,etc)

Other DB specific RPs: DBMIN, LRU-k, 2Q

Page 6: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.6Database System Concepts

File OrganizationFile Organization

Basics A database is a collection of files, file is a collection of

records, record (tuple) is a collection of fields (attributes)

Files are stored on Disks (that use blocks to read and write)

Two important issues:

1. Representation of each record

2. Grouping/Ordering of records and storage in blocks

Page 7: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.7Database System Concepts

File OrganizationFile Organization

Goal and considerations:

Compactness

Overhead of insertion/deletion

Retrieval speed: sometime we prefer to bring more tuples than necessary in MM and use CPU to filter out the unnecessary ones!

Page 8: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.8Database System Concepts

Record RepresentationRecord Representation

Fixed-Length Records Example

Account( acc-number char(10), branch-name char(20), balance real)

Each record is 38 bytes.

Store them sequentially, one after the other

Record0 at position 0, record1 at position 38, record2 at position 76 etc

Compactness (350 bytes)

Page 9: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.9Database System Concepts

Fixed-Length RecordsFixed-Length Records

Simple approach: Store record i starting from byte n (i – 1), where n is the size of

each record.

Record access is simple but records may cross blocks Modification: do not allow records to cross block boundaries

Insertion of record i: Add at the end

Deletion of record i: Two alternatives: move records:

1. i + 1, . . ., n to i, . . . , n – 1

2. record n to i

do not move records, but link all free records on a free list

Page 10: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.10Database System Concepts

Free ListsFree Lists 2nd approach: FLR with Free Lists Store the address of the first deleted record in the file header. Use this first record to store the address of the second deleted record,

and so on Can think of these stored addresses as pointers since they “point” to

the location of a record. More space efficient representation: reuse space for normal attributes

of free records to store pointers. (No pointers stored in in-use records.)

Better handling ins/del

Less compact: 420 bytes

Page 11: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.11Database System Concepts

Variable-Length RecordsVariable-Length Records

3rd approach: Variable-length records arise in database systems in several ways: Storage of multiple record types in a file.

Record types that allow variable lengths for one or more fields.

Record types that allow repeating fields or multivalued attribute.

Byte string representation Attach an end-of-record () control character to the end of each

record

Difficulty with deletion (leaves holes)

Difficulty with growth

4

FieldCount

R1 R2 R3

Page 12: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.12Database System Concepts

Variable-Length Records: Slotted Page Variable-Length Records: Slotted Page StructureStructure

4th approach VLR-SP

Slotted page header contains: number of record entries

end of free space in the block

location and size of each record

Records stored at the bottom of the page

External tuple pointers point to record ptrs: rec-id = <page-id, slot#>

Page 13: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.13Database System Concepts

Page iRid = (i,N)

Rid = (i,2)

Rid = (i,1)

Pointerto startof freespace

SLOT DIRECTORY

N . . . 2 120 16 24 N

# slots

Insertion: 1) Use FP to find space and insert 2) Find available ptr in the directory (or create a new one) 3) adjust FP and number of records

Deletion ?

Page 14: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.14Database System Concepts

Variable-Length Records (Cont.)Variable-Length Records (Cont.) Fixed-length representation:

reserved space

pointers

5th approach: Fixed Limit Records (for VLR)

Reserved space – can use fixed-length records of a known maximum length; unused space in shorter records filled with a null or end-of-record symbol.

Page 15: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.15Database System Concepts

Pointer MethodPointer Method

6th approach: Pointer method Pointer method

A variable-length record is represented by a list of fixed-length records, chained together via pointers.

Can be used even if the maximum record length is not known

Page 16: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.16Database System Concepts

Pointer Method (Cont.)Pointer Method (Cont.) Disadvantage to pointer structure; space is wasted in all

records except the first in a a chain.

Solution is to allow two kinds of block in file: Anchor block – contains the first records of chain

Overflow block – contains records other than those that are the first records of chairs.

Page 17: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.17Database System Concepts

Ordering and Grouping recordsOrdering and Grouping records

Issue #1:

In what order we place records in a block?

1. Heap technique: assign anywhere there is space

2. Ordered technique: maintain an order on some attribute

So, we can use binary search if selection on this attribute.

Page 18: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.18Database System Concepts

Sequential File OrganizationSequential File Organization Suitable for applications that require sequential

processing of the entire file

The records in the file are ordered by a search-key

Page 19: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.19Database System Concepts

Sequential File Organization (Cont.)Sequential File Organization (Cont.) Deletion – use pointer chains

Insertion –locate the position where the record is to be inserted if there is free space insert there

if no free space, insert the record in an overflow block

In either case, pointer chain must be updated

Need to reorganize the file from time to time to restore sequential order

Page 20: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.20Database System Concepts

Clustering File OrganizationClustering File Organization Simple file structure stores each relation in a separate file Can instead store several relations in one file using a

clustering file organization E.g., clustering organization of customer and depositor:

good for queries involving depositor customer, and for queries involving one single customer and his accounts

bad for queries involving only customer results in variable size records

SELECT account-number, customer-nameFROM depositor d, account aWHERE d.customer-name = a.customer-name

Page 21: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.21Database System Concepts

File organizationFile organization

Issue #2: In which blocks should records be placed

Many alternatives exist, each ideal for some situation , and not so good in others:

Heap files: Add at the end of the file.Suitable when typical access is a file scan retrieving all records.

Sorted Files:Keep the pages ordered. Best if records must be retrieved in some order, or only a `range’ of records is needed.

Hashed Files: Good for equality selections. Assign records to blocks according to their value for some attribute

Page 22: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.22Database System Concepts

Data Dictionary StorageData Dictionary Storage

Information about relations names of relations names and types of attributes of each relation names and definitions of views integrity constraints

User and accounting information, including passwords Statistical and descriptive data

number of tuples in each relation

Physical file organization information How relation is stored (sequential/hash/…) Physical location of relation

operating system file name or disk addresses of blocks containing records of the relation

Information about indices (Chapter 12)

Data dictionary (also called system catalog) stores metadata: that is, data about data, such as

Page 23: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.23Database System Concepts

Data dictionary storageData dictionary storage

Stored as tables!!

E-R diagram? Relations, attributes, domains

Each relation has name, some attributes

Each attribute has name, length and domain

Also, views, integrity constraints, indices

User info (authorizations etc)

statistics

Page 24: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.24Database System Concepts

relation

name

attribute

A-name

has

1 N

domain

position

Page 25: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.25Database System Concepts

Data Dictionary Storage (Cont.)Data Dictionary Storage (Cont.)

A possible catalog representation:

Relation-metadata = (relation-name, number-of-attributes, storage-organization, location)Attribute-metadata = (attribute-name, relation-name, domain-type,

position, length)User-metadata = (user-name, encrypted-password, group)Index-metadata = (index-name, relation-name, index-type,

index-attributes)View-metadata = (view-name, definition)

Page 26: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.26Database System Concepts

Large ObjectsLarge Objects

Large objects : binary large objects (blobs) and character large objects (clobs) Examples include:

text documents

graphical data such as images and computer aided designs audio and video data

Large objects may need to be stored in a contiguous sequence of bytes when brought into memory. If an object is bigger than a page, contiguous pages of the buffer

pool must be allocated to store it.

May be preferable to disallow direct access to data, and only allow access through a file-system-like API, to remove need for contiguous storage.

Page 27: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.27Database System Concepts

Modifying Large ObjectsModifying Large Objects

If the application requires insert/delete of bytes from specified regions of an object: B+-tree file organization (described later in Chapter 12) can be

modified to represent large objects

Each leaf page of the tree stores between half and 1 page worth of data from the object

Special-purpose application programs outside the database are used to manipulate large objects: Text data treated as a byte string manipulated by editors and

formatters.

Graphical data and audio/video data is typically created and displayed by separate application

checkout/checkin method for concurrency control and creation of versions

Page 28: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.28Database System Concepts

Data organization and retrievalData organization and retrieval

File organization can improve data retrieval time

Select *From depositorsWhere bname=“Downtown”

Mianus A-215Perry A-218Downtown A-101....

Brighton A-217Downtown A-101Downtown A-110......

Heap Ordered File

Searching a heap: must search all blocks (100 blocks)

OR

Searching an ordered File: 1. Binary search for the 1st tuple in answer : log 100 = 7 block accesses2. scan blocks with answer: no more than 2 Total <= 9 block accesses

100 blocks200 recs/blockAns: 150 recs

Page 29: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.29Database System Concepts

Data organization and retrievalData organization and retrieval

But... file can only be ordered on one search key:

Brighton A-217Downtown A-101Downtown A-110......

Ordered File (bname)Ex. Select * From depositors Where acct_no = “A-110”

Requires linear scan (100 BA’s)

Solution: Indexes! Auxiliary data structures over relations that can improve the search time

Page 30: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.30Database System Concepts

A simple indexA simple index

Brighton A-217 700Downtown A-101 500Downtown A-110 600Mianus A-215 700Perry A-102 400......

A-101A-102A-110A-215A-217......

Index of depositors on acct_no

Index records: <search key value, pointer (block, offset or slot#)>

To answer a query for “acct_no=A-110” we:

1. Do a binary search on index file, searching for A-1102. “Chase” pointer of index record

Index file

Page 31: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.31Database System Concepts

Index ChoicesIndex Choices

1. Primary: index search key = physical order search key

vs Secondary: all other indexes

Q: how many primary indices per relation?

2. Dense: index entry for every search key value

vs Sparse: some search key values not in the index

3. Single level vs Multilevel (index on the indices)

Page 32: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.32Database System Concepts

Measuring ‘goodness’Measuring ‘goodness’On what basis do we compare different indices?

1. Access type: what type of queries can be answered: selection queries (ssn = 123)? range queries ( 100 <= ssn <= 200)?

2. Access time: what is the cost of evaluating queries Measured in # of block accesses

3. Maintenance overhead: cost of insertion / deletion? (also BA’s)

4. Space overhead : in # of blocks needed to store the index

Page 33: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.33Database System Concepts

IndexingIndexing

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 smith forbes ave

… … …

123234345456567

Primary (or clustering) index on SSN

Page 34: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.34Database System Concepts

IndexingIndexing

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

forbes aveforbes aveforbes avemain strmain str

Secondary (or non-clustering) index: duplicates may exist

Address-index

• Can have many secondary indices• but only one primary index

Page 35: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.35Database System Concepts

IndexingIndexing

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

forbes avemain str

secondary index: typically, with ‘postings lists’

Postings lists

Page 36: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.36Database System Concepts

IndexingIndexing

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

123456

Primary/sparse index on ssn (primary key)

>=123

>=456

Page 37: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.37Database System Concepts

IndexingIndexing

Ssn Name Address345 tomson main str234 jones forbes ave123 smith main str567 smith forbes ave456 stevens forbes ave

123234345456567

Secondary / dense index

Secondary on a candidate key:No duplicates, no need for posting lists

Page 38: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.38Database System Concepts

Primary vs SecondaryPrimary vs Secondary

1. Access type: Primary: SELECTION, RANGE

Secondary: SELECTION, RANGE but index must point to posting lists

2. Access time: Primary faster than secondary for range (no list access, all results

clustered together)

3. Maintenance Overhead: Primary has greater overhead (must alter index + file)

4. Space Overhead: secondary has more.. (posting lists)

Page 39: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.39Database System Concepts

Dense vs SparseDense vs Sparse

1. Access type: both: Selection, range (if primary)

2. Access time: Dense: requires lookup for 1st result

Sparse: requires lookup + scan for first result

3. Maintenance Overhead: Dense: Must change index entries

Sparse: may not have to change index entries

4. Space Overhead: Dense: 1 entry per search key value

Sparse: < 1 entry per search key value

Page 40: Storage and File Organization. 11.2Database System Concepts General Overview - rel. model  Relational model - SQL  Formal & commercial query languages.

11.40Database System Concepts

SummarySummary

Dense Sparse

Primary rare usual

secondary usual

• All combinations are possible

• at most one sparse/clustering index

• as many as desired dense indices

• usually: one primary index (probably sparse) and a few secondary indices (non-clustering)