1 CS 232A: Database System Principles Notes 03: Disk Organization.

CS 232A: Database System Principles

Notes 03: Disk Organization

• How to lay out data on disk• How to move it to memory

Topics for today

What are the data items we want to store?

• a salary• a name• a date• a pictureWhat we have available: Bytes

To represent:

• Integer (short): 2 bytese.g., 35 is

00000000 00100011

• Real, floating pointn bits for mantissa, m for exponent….

• Characters various coding schemes suggested,

most popular is ascii

To represent:

Example:A: 1000001a: 11000015: 0110101LF: 0001010

• Booleane.g., TRUE

1111 1111

0000 0000

To represent:

• Application specifice.g., RED 1 GREEN 3

BLUE 2 YELLOW 4 …Can we use less than 1

byte/code?Yes, but only if desperate...

• Datese.g.: - Integer, # days since Jan 1, 1900

- 8 characters, YYYYMMDD - 7 characters, YYYYDDD

(not YYMMDD! Why?)• Time

e.g. - Integer, seconds since midnight - characters, HHMMSSFF

To represent:

• String of characters– Null terminated

– Length givene.g.,

- Fixed length

To represent:

• Bag of bits

Length Bits

To represent:

Key Point

• Fixed length items

• Variable length items- usually length given at beginning

• Type of an item: Tells us how to interpret(plus size if

fixed)

Data Items

Records

Blocks

Memory

Overview

Record - Collection of related data

items (called FIELDS)E.g.: Employee record:

name field,salary field,date-of-hire field, ...

Types of records:

• Main choices:– FIXED vs VARIABLE FORMAT– FIXED vs VARIABLE LENGTH

A SCHEMA (not record) containsfollowing information

- # fields- type of each field- order in record- meaning of each field

Fixed format

Example: fixed format and length

Employee record(1) E#, 2 byte integer(2) E.name, 10 char. Schema(3) Dept, 2 byte code

55 s m i t h 02

83 j o n e s 01

Records

• Record itself contains format“Self Describing”

Variable format

Example: variable format and length

4I52 4S DROF46

Field name codes could also be strings, i.e. TAGS

Variable format useful for:

• “sparse” records• repeating fields• evolving formats

But may waste space...

• EXAMPLE: var format record with repeating fields

Employee one or more children

3 E_name: Fred Child: SallyChild: Tom

Note: Repeating fields does not imply- variable format, nor- variable size

John Sailing Chess --

• Key is to allocate maximum number ofrepeating fields (if not used null)

Many variants betweenfixed - variable format:

Ex. #1: Include record type in record

record type record lengthtells me whatto expect(i.e. points to schema)

5 27 . . . .

Record header - data at beginning

that describes recordMay contain:

- record type- record length- time stamp- other stuff ...

Ex #2 of variant between FIXED/VAR format

• Hybrid format– one part is fixed, other variable

E.g.: All employees have E#, name, deptother fields vary.

25 Smith Toy 2 retiredHobby:chess

# of varfields

Also, many variations in internal

organization of record

Just to show one: length of field

3 F310 F1 5 F2 12 * * *

3 32 5 15 20 F1 F2 F3

total size

offsets

0 1 2 3 4 5 15 20

Question:We have seen examples for

* Fixed format and length records* Variable format and length records

(a) Does fixed format and variable length

make sense?(b) Does variable format and fixed length

make sense?

Other interesting issues:

• Compression– within record - e.g. code selection– collection of records - e.g. find

common patterns

• Encryption

Next: placing records into blocks

blocks ...

a file

assume fixedlength blocks

assume a single file (for now)

(1) separating records(2) spanned vs. unspanned(3) mixed record types – clustering(4) split records(5) sequencing(6) indirection

Options for storing records in blocks:

(a) no need to separate - fixed size recs.(b) special marker(c) give record lengths (or offsets)

- within each record- in block header

(1) Separating records

R2R1 R3

• Unspanned: records must be within one block

block 1 block 2

• Spannedblock 1 block 2

(2) Spanned vs. Unspanned

R3 R4 R5

R2 R3(a)

R3(b) R6R5R4 R7

need indication need indication

of partial record of continuation

“pointer” to rest (+ from where?)

R1 R2 R3(a)

R3(b) R6R5R4 R7

With spanned records:

• Unspanned is much simpler, but may waste space…

• Spanned essential if record size > block size

Spanned vs. unspanned:

Example

106 recordseach of size 2,050 bytes (fixed)block size = 4096 bytes

block 1 block 2

2050 bytes wasted 2046 2050 bytes wasted 2046

• Total wasted = 2 x 109 Utiliz = 50%• Total space = 4 x 109

• Mixed - records of different types(e.g. EMPLOYEE, DEPT)allowed in same block

e.g., a block

(3) Mixed record types

EMP e1 DEPT d1 DEPT d2

Why do we want to mix?Answer: CLUSTERING

Records that are frequently accessed together should bein the same block

Compromise:

No mixing, but keep relatedrecords in same cylinder ...

Example

Q1: select A#, C_NAME, C_CITY, …from DEPOSIT, CUSTOMERwhere DEPOSIT.C_NAME =

CUSTOMER.C.NAME

a blockCUSTOMER,NAME=SMITH

DEPOSIT,NAME=SMITH

• If Q1 frequent, clustering good• But if Q2 frequent

Q2: SELECT * FROM CUSTOMER

CLUSTERING IS COUNTER PRODUCTIVE

Fixed part in one block

Typically forhybrid format

Variable part in another block

(4) Split records

Block with fixed recs.

R1 (a)R1 (b)

Block with variable recs.

R2 (a)

R2 (b)

R2 (c)

This block alsohas fixed recs.

Question

What is difference between- Split records- Simply using two different

record types?

• Ordering records in file (and block) by some key value

Sequential file ( sequenced)

(5) Sequencing

Why sequencing?

Typically to make it possible to efficiently read records in order(e.g., to do a merge-join — discussed later)

Sequencing Options

(a) Next record physically contiguous

(b) Linked

Next (R1)R1

R1 Next (R1)

(c) Overflow area

Recordsin sequence

Sequencing Options

header

• How does one refer to records?

(6) Indirection

Many options: Physical Indirect

Purely Physical

Device IDE.g., Record Cylinder #

Address = Track #or ID Block #

Offset in block

Block ID

Fully Indirect

E.g., Record ID is arbitrary bit string

maprec ID r address

Physicaladdr.Rec ID

Tradeoff

Flexibility Costto move records of indirection(for deletions, insertions)

Physical Indirect

Many optionsin between …

Ex #1 Indirection in block

Header

A block: Free

Block header - data at beginning that

describes blockMay contain:

- File ID (or RELATION or DB ID) - This block ID - Record directory

- Pointer to free space- Type of block (e.g. contains recs type 4;

is overflow, …)- Pointer to other blocks “like it”- Timestamp ...

Ex. #2 Use logical block #’s understood by file system

REC ID File ID Block # Record # or Offset

File ID, PhysicalBlock # Block ID

File Syst. Map

File system map may be “Semi-physical”…

File F1: physical address of block 1table of bad blocks:

B57 XXXB107 YYY

Rest can be computed via formula...

Num. Blocks: 20Start Block: 1000Block Size: 100Bad Blocks:

3 20,0007 15,000

File DEFINITION

Where is Block # 2?

Where is Block # 3?

(1) Separating records(2) Spanned vs. Unspanned(3) Mixed record types - Clustering(4) Split records(5) Sequencing(6) Indirection

Options for storing records in blocks

(1) Insertion/Deletion(2) Buffer Management(3) Comparison of Schemes

1 CS 232A: Database System Principles Notes 03: Disk Organization.

Documents

Transcript of 1 CS 232A: Database System Principles Notes 03: Disk Organization.

CSE 232A Graduate Database Systems

CS 4284 Operating Systems...openers, corresponds to on-disk file descriptor 2. on-disk inode – Region on disk, entry in file descriptor table, that stores persistent information

RedrawingtheBoundaryBetween …nvsl.ucsd.edu/assets/talks/2012-05-21-DaMoN-release.pdf · match"disk"seman;cs" We"wantto"keep"these" Rethink"these" 20 ARIES’ Disk-Centric Design

CSE 232A Database System Implementation

CS 245: Database System Principles Notes 03: Disk Organization

3456 3; &./0(+/1% - Department of Electrical and Computer ...blj/CS-590.26/lectures/LectureE.pdf · Disk. Chapter 17 THE PHYSICAL LAYER. 621. RAMAC to the washing machine-size disk

Disk to Disk Clone

1 CS 232A: Database System Principles Introduction.

SSD (Solid State Disk) - Högskolan Dalarnausers.du.se/~hjo/cs/dt1059/presentation/CF1_6_SSD.pdf · SSD (Solid State Disk) drives • Most SSD drives gives very good performance 4x

CS 370: OPERATING SYSTEMS [DISK CHEDULING ALGORITHMSshrideep/courses/cs... · ¨File system structures are maintained on disk and in memory ¨Operations result in structural changesto

CAS CS 460/660 Introduction to Database Systems Disks ... · Disk Space Management Lowest layer of DBMS software manages space on disk (using OS ﬁle system or not?). Higher levels

Chapter 6 External Memory Disk and RAID (Redundant Arrays of Independent Disks) CS-147 Fall 2010 Jonathan Wang.

CS 245Notes 31 CS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina.

CS 505: Thu D. Nguyen Rutgers University, Spring 2003 1 CS 505: Computer Structures Memory and Disk I/O Thu D. Nguyen Spring 2005 Computer Science Rutgers.

Parts Manual (EN) - FactoryCat Manual (EN) MODELS: 17'' Disk Pad Assist 20'' Disk Pad Assist 23'' Disk Pad Assist 17'' Disk Traction 20'' Disk Traction 23'' Disk Traction 26'' Disk

Recap: Characteristics of I/O Disk Storage and File Systemscs.rochester.edu/~sandhya/csc256/lectures/lecture-filesystems.pdf · Disk Storage and File Systems CS 256/456 ... ECC codes

Approximation Algorithms for Unit Disk Graphswebdoc.sub.gwdg.de/ebook/serien/ah/UU-CS/2004-066.pdfApproximation Algorithms for Unit Disk Graphs∗ Erik Jan van Leeuwen Institute for

232A Rev D Tsunami User Manual

Live Disk Forensics on Bare Metal - sites.cs.ucsb.educspensky/slides/Hu-Spensky-OSDFCon2014.pdf · – Appendix B is useful! Live Disk Forensics - 27 CS & HH 11/5/2014 SATA Reconstruction

CS 423 Operating System Design: Disk Scheduling Algorithms · CS 423: Operating Systems Design Disk Scheduling Performance 15 What factors impact disk performance? Seek Time: Time