Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and...

48
Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro & terminology TP m ons & ORBs Logging & res. M gr. Files& BufferM gr. Structured files 11:00 Reliability Locking theory Res. M gr. & Trans. M gr. COM + A ccesspaths 13:30 Fault tolerance Locking techniques CICS & TP & Internet CORBA/ EJB + TP G roupw are 15:30 Transaction models Q ueueing A dvanced Trans. M gr. Replication Perform ance & TPC 18:00 Reception Workflow Cyberbricks Party FREE Structured Files Chapter 19

Transcript of Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and...

Page 1: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro &

terminologyTP mons& ORBs

Logging &res. Mgr.

Files &Buffer Mgr.

Structuredfiles

11:00 Reliability Lockingtheory

Res. Mgr. &Trans. Mgr.

COM+ Access paths

13:30 Faulttolerance

Lockingtechniques

CICS & TP& Internet

CORBA/EJB + TP

Groupware

15:30 Transactionmodels

Queueing AdvancedTrans. Mgr.

Replication Performance& TPC

18:00 Reception Workflow Cyberbricks Party FREE

Structured Files

Chapter 19

Page 2: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

What The Record Manager Does

Storage allocation: store tuples in file blocks Tuple addressing: give tuple an id identifier

provide fast access via that id.Enumeration: fast enumeration of all relation’s tuplesContent addressing: give fast accessible via attribute

values.Maintenance: update/delete a tuple and its access

paths.Protection: support for security

encrypt or tuple-granularity access control.

Page 3: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Outline

Representing values Representing records Storing records in pages and across

pages Organizing records (entry, relative, key,

hash) Examples of fix/log/log logic.

Page 4: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Record Allocation in a Page

Recall:

File is a collection of fixed-length pages (blocks).File and buffer managers map files to disc/RAM

slot on disk block page

page body

Blo

ckT

rail

er

Pag

eD

ir

Pag

eH

ead

Blo

ckH

ead

chec

ksu

m

Page 5: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Page Declares

typedef struct /* global page numbers */{ FILENO fileno; /*file where the page lives */

uint pageno; /* page number within the file */} PAGEID, *PAGEIDP; /* */

typedef struct PAGEID thatsme; /* identifies the page */PAGE_TYPE page_type; /* see description above */OBJID object_id; /* internal id of the relation,index,etc. */LSN safe_up_to; /* page LSN for the WAL - protocol */PAGEID previous; /* often pages are members of doubly */PAGEID next; /* linked lists */PAGE_STATE status; /* valid,in-doubt,copy of something,etc*/int no_entries; /* # entries in page dir (see below) */int unused; /* free bytes not in freespace */int freespace; /* # contiguous free bytes for data */char stuff[]; /* will grow */} PAGE_HEADER, * PAGE_PTR; /* */

Page 6: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Different uses of pages

Data: Homogeneous record storage

Cluster: like Data except many different record types

Index (access path): hashed or B-tree

Free-space bitmap: describes status of 4,000 other pages.

Directory: meta-data about this or other files

Page 7: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Page Directory: Points to Records on Page

Record id is: File, Page, Directory_offset

Page Header 1st Tuple 2nd

2nd Tuple 3rd Tuple 4thTuple

5th Tuple

2 1345Page directory grows in this direction

Tuples are inserted in this direction

Page 8: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Accessing a Record

Read by TID: Lock record sharedlocate pageGet semaphore sharedfollow directory offsetcopy tupleGive semaphore

Insert by TID: Lock record exclusivelocate pageGet semaphore exclusiveFind spaceInsertlog insert (tid, new value).update page LSN, header, directory,Give semaphore

Page 9: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Accessing a Record

Delete by TID: Lock record exclusivelocate pageGet semaphore exclusiveAdd record to free spaceLog delete (tid, old value).update page LSN, header, directory,Give semaphore

Update TID: much like delete-&-insert

Page 10: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Finding Space for Insert / Update

If tuple fits in page contiguous free-space: easy.If tuple fits in page free space: reorganize (compress)

Physiological logging makes this cheap.If tuple does not fit then:

leave forwarding address on page.Optionally leave record prefix on page.Segment record among several pages.

tid

Page 11: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Finding space within a file

Free space table:Summarizes status of many pages

(8KB page => 64Kb => 500MB of 8KB data pages) Good for clustered & contiguous allocation

bitmap should be transaction protectedIf transaction aborts, page is freed again.Alternatively, treat bitmap as a hintRebuild periodically.

p1 p2 p3 p4 p5 p6

f17

f2 f3 f4f5 f6 f7f8 f9 f10f11f12f13f14f15f16

f18f19

.

.

...

.

.

.

.

.···

p7

. .

.

P19 P20 P21

.....f2 F19 21f3 f4 f5 f6 f7

···

Free space directories

Page 12: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Finding space within a file

Free space cursor/list

Chain should be transaction protectedElse: rebuild at restart

do not trust pointers (free page may be allocated).

empty_page_anchor point_of_insert

. .

file catalog chain of empty pages

page for next insert

Page 13: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 13

Tuple Allocation - I

The first strategy maintains a pointer to the “current block for insert” (CBI). When that block fills up, an empty block is requested from a system service, which then becomes the new “current block for insert”.

And so on. This is the sequential insert strategy.

Questions: What happens, when the pointer arrives at the last block? How do we reclaimspace freed by deleted tuples?

CBI:

head of list of emptyblocks

head of list of emptyblocks

head of list of emptyblocks

head of list ofempty blocks

where next?

Page 14: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 14

Incremental Space Expansion - I

When the list of empty blocks is exhausted, there are two options to find space fornew tuples. Let us assume the following configuration:

And so on. This works as long as enough space is freed up by deleted tuples. If thereare only few gaps, finding space for a new tuple can become very expensive, becausemany blocks have to be probed sequentially.

CBI:

The first option is to let the CBI pointer circulate over the set of allocated blocks,assuming that space is released by deleted tuples.

The need to probe blocks that are completely filled can be avoided by maintaininga an array of bits that contains one bit per block indicating whether a block is full:

100 10

Page 15: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Naming Tuples (records)Relative byte address:

file, offset in file: OK for insert-then-read-only DBs record can't easily grow.deleted space not easily reclaimed.

Tuple Identifierfile, page, index: The design shown below.

Main disadvantage: expensive reorganization (fixing overflows)

dir_index 3 7446

pagenofileidnodeid

7446 5127

this tuple

pseudo -TID

dir_indexfileidnodeid

this tuple

3 7446pageno

7446

Page 16: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Implementing Database Keys

Address record via directory

Address has a ID to allow for invalidation

ID never reused.

Pointer can be swizzled.

Popular with network & OO DBs

7 record seq. no.

A fileid

K nodeid

7446

this tuple

database key of "this tuple"

7446

database key translationtable for file A at node K

page directory

pageid index

7

offset

id

id11

11

Page 17: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Naming Tuples via Primary Key

{Entry Sequenced, Relative}: primary key is physical addr{Hash, B-tree}: primary key is content (primary key)Primary Key an alternative to DBkeyB-tree clusters related dataProblems:

B-tree access is slower than Hash.Hash & B-tree keys not fixed lengthbut neither is node.db_key

Benefit: key can grow to LARGE databasesGood for distributed/partitioned data

It’s religious.

Page 18: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Datatype Representation

E: External representation: ASCII, ISO Latin1, Unicode,...P: Programming language representation

many: PL/1, Cobol, C, all have different VARCHARmany type mismatches between P and F

: interval, datetime, user,...F: File representation: "native" types (e.g.: null values, ....).

Lots of mapping functions.

It would be great if F-1(F(x)) = x for these functions, but....

Called the impedance mismatch between DB and PL

E P F

m : value input from theuser

EP

m : value output to the user

PE

EF

m : modification through application programPF

m : SELECTing values into application program

FP

m : input through interactive SQL

m : interactive query resultsFE

Page 19: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Datatype Representations

P _ F: Implies a special language (all other languages are 2nd class)

E _ F: Use characters for everything.Problem: E changes from country to country!

(all other languages are 2nd class)No easy way out of this.Unicode will help most of us and make E_F more attractive

E P F

m : value input from the userEP

m : value output to the userPE

EF

m : modification through application program PF

m : SELECTing values into application program

FP

m : input through interactive SQL

m : interactive query results FE

Page 20: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Representing Records

relations

attributes

field

typelength

offset

attribute description

meta data

tuple addressingphysical tuple

attr.1 attr.2 attr.3 attr.4 attr.5

·

· ·

Page 21: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Representing Recordsstruct relations{

Uint relation_no; /* internal id for the relation */char * owner; /* user id of the creator */long creation_date; /* date when it was created */PAGENO current_point_of_insert; /* free space done via */PAGENO empty_page_anchor; /* free space cursor method */Uint no_of_attributes; /*#attributes in relation */Uint no_of_fixed_atts; /* # fixed-length attributes */Uint no_of_var_atts; /* # variable-length attributes */

struct attributes * p_attr;} /* pointer to the attributes array */struct attributes[]; /* attributes array */

{ char * attribute_name; /* external name of the attribute */Uint attribute_position; /* index of the field in the tuple (1,2,...) */char attribute_type; /* this encodes the SQL - type definition */Boolean var_length; /* is it variable_length field ? */Boolean nulls_allowed; /* can field assume NULL value ? */char * default_value; /* value assumed if none stored in tuple */Uint field_length; /* maximum length of field */int accumulated_offset; /* explained later */Uint significant_digits; /* for data type FIXED */char * encryption_key; /* if the value encrypted */char * rest;} /* further information on the attribute */

Page 22: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Representing Records

Generic header (rid, tid, #fields)

all fixed length encoding(fat records, fast-simple code

max < page path length)

variable fields have length(short records, slow code)

type-length-value (simple slow code, easy reorg)

fixed + ptrs to variables.(compact, fast code)

m

3 4

tuple length

F1 F2 F3 F4 F5 F62 4 8 10

n m L

F1 F2 F3 F4 F5 F63 4 2 4

L 3 4 2 4 n F 1 F 2 F 3 F 4 F 5 F

general prefix to all tuple representations

relation-id tuple-id

tuple length

number of fields in the tuple or actual tuple length

number of fields

name

number of fields

6

Page 23: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Representing Records (Reuter Recommends)

Page 24: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Some Details

Representing null values:missing fieldspecial valueextra fieldbitmap

Representing keysefficient comparison is importantstore "conditioned" key so simple byte-compare.Flip integer sign (so negative sorts low)Flip float so exponent first, mantissa second, flipped signsCompress varchars.MANY refinements.Want an order-preserving compression.

Page 25: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Fat Records (Longer Than a Page)

Header Tuple

page p

No Way

Header

tuple

Header

long field

page p page k

Header

page p

HeaderHeader

Header

page p+1

page p' page p''

long tuple

other tuples

other tuples

empty

linear address space

Header

page p

HeaderHeader

Header

page p+1

page p' page p''

long tuple

other tuples

other tuples

empty

linear address space

Record must fit on page.

Long fields segregated to separate page: may be good in

some cases (Multi-media DBs)

Overflow page chains

Segment record across pages

Page 26: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Obese records (Longer Than 10 Pages)

If record is super-large, then may want to index into it quickly.

“Obvious" design is standard tree.

Record is root of tree.

Grow levels when one fills.

Allows blob growth, update,...

Page 27: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Non-Normalized Relations

Page 28: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Structured File Definition

File

unstructured (system sequenced)

structured

entry sequenced

relativekeyed hash

clustered

associative non-associative

Page 29: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

File Layouts

Unstructured: a sequence of bytes

Structured, Entry Sequenced.

Records inserted at endRecords cannot growkey is RBA (relative byte address)

Relative: fixed size record slotsrecords limited by that sizekey is relative record number

eof

eof

Page 30: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Associative File Types

Hashed: Records addressed by key

field(s)bucket has list of recordsoverflow to other bucketsor to overflow pages.

Key Sequenced Records addressed by keyfield(s)Records in sorted order.either sorting or b-tree or...

As Bs Ys Zs

Page 31: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Parameters at Create

DatabaseRecord type (fields)KeyOrganization { Entry Sequenced, Relative, Hashed, Key

Sequenced }Block size (page size)Extent size (storage area)Partitioning (among discs or nodes) by key.Attributes: access control

allocation and archive strategytransactionallifetime, zero on free, and on and on ....

Page 32: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Parameters at Create

"Secondary" indices.Primary key is....(e.g. customer number).Secondary key is social security number

Non-Unique secondary key is Last_Name, First_name

Secondary indices can be {unique or not }and {hashed or Key Sequenced

}index is like a table.fields of index are: secondary key, primary keySo can define index on any kind of base table

Base Table

Page 33: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Secondary Index Example

Base table is key-sequenced on CustomerNumber.

Index table is key sequence on Name-CustomerNumber.

Index can be a replica of the base table in another order.Transaction recovery and locking keeps them consistent.

Tuple management systemMaintains indices (insert, update, delete)

Navigates to base table via secondary index as one request.

Page 34: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

What happens when you open a relation?

Many files get opened.Read directory (catalog)Partitions, Indices

Access module open (filename,.....)

Tuple oriented file system

read file descriptor do security checking return file descriptor

read file descriptor

if there are other partitions: open partititonsif there are indices: open indices

access the file

Page 35: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Once OPEN, Application can SCAN the

relation

Scan is a row & column subsetSELECT <column list> FROM <table> WHERE <predicate>

With a specified start/stop keyAND <key> BETWEEN <low> AND <high>

In a specified order (supported by a secondary index)ASCENDING | DESCENDING

A locking protocol {Serializable | Repeatable Read | Committed ReadUncommited Read | Skip Uncommitted |…}

TIMEOUT <seconds>

Page 36: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

SCAN States

Tuples in the Scan (Represented by their key values)

Before

At

After

Null

Scan state

1 2 3 4 5 n K K K K K · · · K

1 2 3 4 5 n K K K K K · · · K

1 2 3 4 5 n K K K K K · · · K

scan closed

Page 37: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

SCAN States: How they change

On error, scan state does not change.On open,

scan is {before | after} the {first | last} set element if scan is {ascending | descending}

On fetch next: if {not end of set | at end of set}

scan is {at next | before first | after last } element

On insertscan is at element

On deletescan is at the missing element

Page 38: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

SCAN States: How they change

On update: scan position is not affected.if tuple moves (because ordering attributes affected)

scan key position is unchanged

Can create Halloween problem (give everybody a 10% raise)But scan enumerates entire set.

Tuples in the Scan (Represented by their key values)

Update

1 2 3 4 5 n K K K K K · · · K

1 2 3 4 5 n K K K K K · · · K

Scan Direction

K 3

Scan is "at" key K after the delete, even if the record moves.

3

Moved Tuple

Page 39: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

SCAN Data structure

enum SCAN_STATE { TOP, ON, BOTTOM, BETWEEN, NIL }; /* the 5 scan states */enum ISOLATION { UNCOMMITTED_READ,..., SERIALIZABLE, READ_PAST, BOUNCE };

typedef struct { Uint scanid; /* handle for scan; returned by open_scan*/TRID owner; /* which transaction uses the scan */FILE * fileid; /* handle of file the scan is defined on */char * scan_key; /* specification of scan key attribute(s) */char * start_key; /* lower bound of scan range */char * stop_key; /* upper bound of scan range */char * filter; /* qualifying predicate for all tuples in scan*/ISOLATION isol_degree; /* locking policy for tuples accessed */SCAN_STATE scan_state; /* state of scan pointer */char scan_key[ ]; /* scan key the scan is before, at, or after */} SCANCB, * SCANCBP;

Page 40: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Entry Sequenced File Insert

fix page descriptor page find eof pagefix eof data pageif no space in page < see next slide for transaction to advance page>unfix descriptor pageadd record to page (updating on-page directory)generate log record (new value) and update page LSN.compute lock name of record (based on TID).get lock on recordunfix data page.To make this work, MUST be assured lock is availableOtherwise page sem can (undetected)deadlock with lock wait So, UNDO of entry-sequence insert does not free the space, it just invalidates the record.

Page 41: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Entry Sequenced File Insert If EOF page or File is Full

Begin new transaction (will not abort if insert aborts) to extend file EOF page. (leaves insert transaction)

unfix directory pageif file full, panic() start a top-level transaction fix the directory advance the page eof updating directory and freespace log the changes fix the data page format it log the change unfix the directory and data pagecommit the transaction & resume insert transactionfix directory, fix eof, check to see that there is room for the record.

Top level transaction to extend file

Page 42: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Entry Sequenced Operations

Delete by RBA.get record lock (node, file, RBA) exclusive if {timeout, deadlock, error}

return error;Fix pageMark record invalidGenerate log recordUpdate page lsnUnfix page.

Read by RBA.get record lock (node, file, RBA) shared if {timeout, deadlock, error}

return error;Fix pageif record valid copy to bufferUnfix pageReturn record or null

Note: both must test that RBA <= EOF. Update, ReadNext, ... are similar.

Page 43: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Relative Files

Records fit in fixed-length slotsOperation on slots.Separate transactions extend the file EOF

(allocate and format pages)

Empty Slot

Empty Slot

... 10 88 18 0 62 82 100 75 Page Directory

Page Header

Record lengths

Page 44: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Relative Files

{Read | Insert | Update | Delete} by key are all easy

Insert "near" key works by:Plan A:

look at pageLook at neighbor pages (left, right, left, right,...)

Plan B: allocate overflow page for base page

Plan C: Look in free-space bit-map or byte (%full) map.

Page 45: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Key Sequenced or Hashed Files

Key sequenced is subject of next chapter.

Page 46: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

File Clustering

Different record types kept in same page/file

For example: Master and detail records of an invoice.Detail records always accessed if master is.

Situation:Master key : InvoiceNoDetail key: InvoiceNo Foreign Key References Master+ SequenceNo

Technique:

Hash or Key sequence Master on InvoiceNoHash or Key Sequence Detail on InvoiceNo+SequenceNo in same table.

Page 47: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Clustering different record types in a page

One disc request gets the entire order.Concept works for any storage hierarchyIs natural for Hierarchical database systems.

1010 010 110 210 310 42020 020 13333 033 133 2

MasterDetail

Master

MasterDetail

Detail

Page

Page 48: Structured Files Chapter 19. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 What The Record Manager.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999

Summary

Representing values

Representing records

storing records in pages and across pages

Organizing records (entry, relative, key, hash)

Examples of fix/log/log logic.