Representing Data Elements
description
Transcript of Representing Data Elements
![Page 1: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/1.jpg)
1
Representing Data Elements
Fields, Records, BlocksVariable-length DataModifying Records
Source: our textbook
![Page 2: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/2.jpg)
2
Overview
Attributes are represented by sequences of bytes, called fields
Tuples are represented by collections of fields, called records
Relations are represented by collections of records, called files
Files are stored in blocks, using specialized data structures to support efficient modification and querying
![Page 3: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/3.jpg)
3
Representing SQL Data Types
integers and reals: built-in CHAR(n): array of n bytes VARCHAR(n): array of n+1 bytes
(extra byte is either string length or null char)
dates and times: fixed length strings etc.
![Page 4: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/4.jpg)
4
Representing Tuples
For now, assume all attributes (fields) are fixed length.
Concatenate the fields Store the offset of each field in
schema0 30 286 287 297
nameCHAR(30)30 bytes
addressVARCHAR(255)256 bytes
genderCHAR(1)1 byte
birthdateDATE10 bytes
![Page 5: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/5.jpg)
5
More on Tuples
Due to hardware considerations, certain types of data need to start at addresses that are multiples of 4 or 8
Previous example becomes:0 32 288 292 304
nameCHAR(30)30 bytes+ 2
addressVARCHAR(255)256 bytes
genderCHAR(1)1 byte+ 3
birthdateDATE10 bytes+ 2
![Page 6: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/6.jpg)
6
Record Headers
Often it is convenient to keep some "header" information in each record: a pointer to schema information
(attributes/fields, types, their order in the tuple, constraints)
length of the record/tuple timestamp of last modification
![Page 7: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/7.jpg)
7
Packing Records into Blocks
Start with block header: timestamp of last modification/access offset of each record in the block, etc.
Follow with sequence of records May end with some unused space
headerrecord 1record 2 … record n-1record n
![Page 8: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/8.jpg)
8
Representing Addresses
Often addresses (pointers) are part of records: the application data in object-oriented databases as part of indexes and other data structures
supporting the DBMS Every data item (block, record, etc.) has
two addresses: database address: address on the disk (typically 8-16 bytes) memory address, if the item is in memory
(typically 4 bytes)
![Page 9: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/9.jpg)
9
Translation Table
Provides mapping from database addresses to memory addresses for all blocks currently in memory
Later we'll discuss how to implement it
![Page 10: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/10.jpg)
10
Pointer Swizzling
When a block is moved from disk into main memory, change all the disk addresses that point to items in this block into main memory addresses.
Need a bit for each address to indicate if it is a disk address or a memory address.
Why? Faster to follow memory pointers (only uses a single machine instruction).
![Page 11: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/11.jpg)
11
Example of Swizzling
Block 1
Block 2
Disk Main Memory
read intomain memory swizzled
unswizzled
![Page 12: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/12.jpg)
12
Swizzling Policies
Automatic swizzling: as soon as block is brought into memory, swizzle all relevant pointers
Swizzling on demand: only swizzle a pointer if and when it is actually followed
No swizzling: always refer to translation table
Programmer control
![Page 13: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/13.jpg)
13
Automatic Swizzling
Locating all pointers within a block: refer to the schema, which will indicate where
addresses are in the records for index structures, pointers are at known
locations Update translation table with memory
addresses of items in the block Update pointers in the block (in memory)
with memory addresses, when possible, as obtained from translation table
![Page 14: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/14.jpg)
14
Unswizzling
When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses
Use translation table again Important to have an efficient data
structure for the translation table
![Page 15: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/15.jpg)
15
Pinned Records and Blocks
A block in memory is pinned if it cannot be safely written back to disk
Indicate with a bit in the block header
Reasons for pinning: related to failure recovery (more later) because of pointer swizzling
If block B1 has swizzled pointer to an item in block B2, then B2 is pinned.
![Page 16: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/16.jpg)
16
Unpinning a Block Consider each item in the block to be
unpinned Keep in the translation table the places
in memory holding swizzled pointers to that item (e.g., with a linked list)
Unswizzle those pointers: use translation table to replace the memory addresses with database (disk) addresses
![Page 17: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/17.jpg)
17
Variable Length Data
Data items with varying size (e.g., if maximum size of a field is large but most of the time the values are small)
Variable-format records (e.g., NULLs method for representing a hierarchy of entity sets as relations)
Records that do not fit in a block (e.g., an MPEG of a movie)
![Page 18: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/18.jpg)
18
Variable-Length Fields
Store the fixed-length fields before the variable-length fields in each record
Keep in the record header record length pointers to the beginnings of all the
variable-length fields Book discusses variations on this idea
![Page 19: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/19.jpg)
19
Variable Length Fieldsotherheaderinfo
recordlength
to var lenfield 2
var lenfield 2
var lenfield 3
fixed lenfield 2
var lenfield 1
fixed lenfield 1
to var lenfield 3
![Page 20: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/20.jpg)
20
Variable-Format Records
Represent by a sequence of tagged fields
Each tagged field contains name type length, if not deducible from the type value
![Page 21: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/21.jpg)
21
Splitting Records Across Blocks
Called spanned records Useful when
record size exceeds block size putting an integral number of records in a
block wastes a lot of the block (e.g., record size is 51% of block size)
Each record or fragment header contains bit indicating if it is a fragment if fragment then pointers to previous and
next fragments of the record (i.e., a linked list)
![Page 22: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/22.jpg)
22
Record Modification
Modifications to records: insert delete update
issues even with fixed-length records and fields
even more involved with variable-length data
![Page 23: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/23.jpg)
23
Inserting New Records
If records need not be any particular order, then just find a block with enough empty space
Later we'll see how to keep track of all the tuples of a given relation
But what if blocks should be kept in a certain order, such as sorted on primary key?
![Page 24: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/24.jpg)
24
Insertion in Order
If there is space in the block, then add the record(going right to left), add a pointer to it (going leftto right) and rearrange the pointers as needed.
record4
record3
record2
record1
unused
header
![Page 25: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/25.jpg)
25
What if Block is Full?
Records are stored in several blocks, in sorted order
One approach: keep a linked list of "overflow" blocks for each block in the main sequence
Another approach is described in the book
![Page 26: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/26.jpg)
26
Deleting Records
Try to reclaim space made available after a record is deleted
If using an offset table, then rearrange the records to fill in any hole that is left behind and adjust the pointers
Additional mechanisms are based on keeping a linked list of available space and compacting when possible
![Page 27: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/27.jpg)
27
Tombstones
What about pointers to deleted records?
We place a tombstone in place of each deleted record
Tombstone is permanent Issue of where to place the tombstone Keep a tombstone bit in each record
header: if this is a tombstone, then no need to store additional data
![Page 28: Representing Data Elements](https://reader035.fdocuments.in/reader035/viewer/2022062722/56813ae1550346895da33223/html5/thumbnails/28.jpg)
28
Updating Records
For fixed-length records, there is no effect on the storage system
For variable-length records: if length increases, like insertion if length decreases, like deletion
except tombstones are not necessary