CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How...

Post on 13-Dec-2015

214 views 0 download

Transcript of CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How...

CS4432: Database Systems II

Data Storage

1

Storage in DBMSs• DBMSs manage large amounts of data

• How does a DBMS store and manage large amounts of data?– Has significant impact on performance

• Design decisions:– What representations and data structures best support efficient

manipulations of this data?

• To understand why the DBMSs applies specific strategies – Must first understand how disks work

2

Disks and Files

• DBMS stores information on (“hard”) disks.• Main memory is only for processing

• This has major implications for DBMS design!– READ: transfer data from disk to main memory

(RAM).– WRITE: transfer data from RAM to disk.– Both are high-cost operations, relative to

in-memory operations, so must be planned carefully!

3

DBMS vs. OS? Who’s in Control

• DBMS is in control of managing its data– It knows more about structure– It knows more about access pattern

4

That is why DBMS has Storage Manager

& Buffer Manager

5

Understanding Disks

6

Storage Hierarchy

Cache (all levels)

Main Memory

Secondary Storage

Tertiary Storage

Fastest

SlowestAvg. Size: 256kb-1MB

Read/Write Time: 10-8 seconds.

Random Access

Smallest of all memory, and also the most costly.

Usually on same chip as processor.

Easy to manage in Single Processor Environments, more complicated in Multiprocessor Systems.

Avg. Size: 128 MB – 1 GB

Read/Write Time: 10-7 to 10-8 seconds.

Random Access

Becoming more affordable.

Volatile

Avg. Size: 30GB-160GB

Read/Write Time: 10-2 seconds

NOT Random Access

Extremely Affordable: $0.68/GB!!!

Can be used for File System, Virtual Memory, or for raw data access.

Blocking (need buffering)

Avg. Size: Gigabytes-Terabytes

Read/Write Time: 101 - 102 seconds

NOT Random Access, or even remotely close

Extremely Affordable: pennies/GB!!!

Not efficient for any real-time database purposes, could be used in an offline processing environment

7

Storage Hierarchy

8

Memory Hierarchy Summary

10-9 10-6 10-3 10-0 103

access time (sec)

1015

1013

1011

109

107

105

103

cache

electronicmain

electronicsecondary

magneticopticaldisks

onlinetape

nearlinetape &opticaldisks

offlinetape

typi

cal c

apac

ity

(byt

es)

9

Memory Hierarchy Summary

10-9 10-6 10-3 10-0 103

access time (sec)

104

102

100

10-2

10-4

cache

electronicmain

electronicsecondary magnetic

opticaldisks

onlinetape

nearlinetape &opticaldisks

offlinetape

doll

ars/

MB

10

Why Not Store Everything in Main Memory?

• Costs too much. $100 will buy you either 16GB of RAM or 360GB of disk today.

• Main memory is volatile. We want data to be saved between runs. (Obviously!)

• Typical hierarchy:– Main memory (RAM) Processing– Disks (secondary storage) Persistent Storage– Tapes & DVDs Archival

11

MotivationConsider the following algorithm :

For each tuple r in relation R{Read the tuple rFor each tuple s in relation S{

read the tuple s append the entire tuple s to r

}}

What is the time complexity of this algorithm?

12

Motivation• Complexity:– This algorithm is O(n2) ! Is it always ?– Yes, if we assume random access of data.

• Hard disks are not efficient in Random Access !

• Unless organized efficiently, this algorithm may be much worse than O(n2).

13

Disks: Some Facts

• Data is stored and retrieved in units called disk blocks. – Disk block 512 bytes to 4K or 8K

• Movement to main-memory–Must read or write one block at a time

14

Disk Components

Platter (2 surface)

15

Virtual CylinderDisk Head

Platter

Cylinder

16

Tracks divided into Sectors Track

Sector

Gap

Gaps ≈ 10%

Sectors ≈ 90%

17

Movements

• Arm moves in-out– Called seek time– Mechanical

• Platter rotates– Called latency time– Mechanical

18

Actual Disk

19

Disk Controller

Processor

Memory Disk Controller

......

Disk 1

Disk 2

1. Controls the mechanical movement

2. Transferring the data from disks to memory

3. Smart buffering and scheduling

20

How big is the disk if?

• There are 4 platters• There are 8192 tracks per surface• There are 256 sectors per track• There are 512 bytes per sector

Size = 2 * num of platters * tracks * sectors * bytes per sector

Size = 2 * 4* 8192 * 256 * 512

Size = 233 bytes / (1024 bytes/kb) /(1024 kb/MB) /(1024 MB/GB)

Size = 233 = 23 * 230 = 8GB

Remember 1kb = 1024 bytes, not 1000!

21

Scale of Bytes

22

More Disk Terminology

• Rotation Speed: – The speed at which the disk rotates: 5400RPM

• Number of Tracks: – Typically 10,000 to 15,000.

• Bytes per track: – ~105 bytes per track

23

Big Question: What about access time?

block xin memory

?

I wantblock X

Time = Disk Controller Processing Time + Disk Delay{seek & rotation} +

Transfer Time 24