Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... ·...
Transcript of Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... ·...
![Page 1: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/1.jpg)
Database Architecture 2 & Storage
Instructor: Matei Zahariacs245.stanford.edu
![Page 2: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/2.jpg)
Summary from Last Time
System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based optimizer» Lock manager» Recovery» View-based access control
CS 245 2
![Page 3: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/3.jpg)
A Note on Recovery Methods
CS 245 3
Jim Gray, “The Recovery Manager of the System R Database Manager”, 1981
![Page 4: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/4.jpg)
Outline
System R discussion
Relational DBMS architecture
Alternative architectures & tradeoffs
Storage hardware
CS 245 4
![Page 5: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/5.jpg)
Typical RDBMS Architecture
Buffer Manager
Query Parser
User Transaction Transaction Manager
Query Planner
Recovery ManagerConcurrency Control
LogLock Table Mem.Mgr. Buffers
Data StatisticsIndexes
User Data System Data
File Manager
User
CS 245 5
![Page 6: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/6.jpg)
BoundariesSome of the components have clear boundaries and interfaces for modularity» SQL language» Query plan representation (relational algebra)» Pages and buffers
Other components can interact closely» Recovery + buffers + files + indexes» Transactions + indexes & other data structures» Data statistics + query optimizer
CS 245 6
![Page 7: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/7.jpg)
Differentiating by Workload
Two big classes of commercial RDBMS today
Transactional DBMS: focus on concurrent, small, low-latency transactions (e.g. MySQL, Postgres, Oracle, DB2) → real-time apps
Analytical DBMS: focus on large, parallel but mostly read-only analytics (e.g. Teradata, Redshift, Vertica) → “data warehouses”
CS 245 7
![Page 8: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/8.jpg)
How To Design Components for Transactional vs Analytical DBMS?
Component Transactional DBMS
Analytical DBMS
Data storage
Locking
Recovery
CS 245 8
![Page 9: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/9.jpg)
How To Design Components for Transactional vs Analytical DBMS?
Component Transactional DBMS
Analytical DBMS
Data storage B-trees, row oriented storage
Column-oriented storage
Locking
Recovery
CS 245 9
![Page 10: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/10.jpg)
How To Design Components for Transactional vs Analytical DBMS?
Component Transactional DBMS
Analytical DBMS
Data storage B-trees, row oriented storage
Column-oriented storage
Locking Fine-grained, very optimized
Coarse-grained (few writes)
Recovery
CS 245 10
![Page 11: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/11.jpg)
How To Design Components for Transactional vs Analytical DBMS?
Component Transactional DBMS
Analytical DBMS
Data storage B-trees, row oriented storage
Column-oriented storage
Locking Fine-grained, very optimized
Coarse-grained (few writes)
Recovery Log data writes, minimize latency
Log queries
CS 245 11
![Page 12: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/12.jpg)
Outline
System R discussion
Relational DBMS architecture
Alternative architectures & tradeoffs
Storage hardware
CS 245 12
![Page 13: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/13.jpg)
How Can We Change the DBMS Architecture?
CS 245 13
![Page 14: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/14.jpg)
Decouple Query Processing from Storage ManagementExample: big data ecosystem (Hadoop, GFS, etc)
Large-scalefile systems or
blob storesGFS
File formats& metadata
Processingengines
MapReduce
CS 245 14“Data lake” architecture
![Page 15: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/15.jpg)
Decouple Query Processing from Storage ManagementPros:» Can scale compute independently of storage
(e.g. in datacenter or public cloud)» Let different orgs develop different engines» Your data is “open” by default to new tech
Cons:» Harder to guarantee isolation, reliability, etc» Harder to co-optimize compute and storage» Can’t optimize across many compute engines» Harder to manage if too many engines!
CS 245 15
![Page 16: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/16.jpg)
Change the Data Model
Key-value stores: data is just key-value pairs, don’t worry about record internals
Message queues: data is only accessed in a specific FIFO order; limited operations
ML frameworks: data is tensors, models, etc
CS 245 16
![Page 17: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/17.jpg)
Change the Compute Model
Stream processing: Apps run continuously and system can manage upgrades, scaleup, recovery, etc
Eventual consistency: handle it at app level
CS 245 17
![Page 18: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/18.jpg)
Different Hardware Setting
Distributed databases: need to distribute your lock manager, storage manager, etc, or find system designs that eliminate them
Public cloud: “serverless” databases that can scale compute independently of storage (e.g. AWS Aurora, Google BigQuery)
CS 245 18
![Page 19: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/19.jpg)
AWS Aurora ServerlessCS 245 19
![Page 20: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/20.jpg)
Outline
System R discussion
Relational DBMS architecture
Alternative architectures & tradeoffs
Storage hardware
CS 245 21
![Page 21: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/21.jpg)
CPU
DRAM
StorageDevices
Typical Server
CPU
...
I/O Controller
CS 245 22
Network Card
![Page 22: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/22.jpg)
Storage Performance Metrics
CS 245 23
latency (s)
throughput (bytes/s)
storage capacity(bytes, bytes/$)
CPU
![Page 23: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/23.jpg)
CS 245 24
![Page 24: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/24.jpg)
Storage Latency
RegistersL1 CacheL2 Cache
Memory
Disk
12
10
150
Tape /Optical Robot
109
106
This CampusThis Room
My Head
10 min
2 hr
2 Years
1 min
Pluto
2,000 Years
Andromeda
CS 245 25
Sacramento
![Page 25: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/25.jpg)
Max Attainable Throughput
Varies significantly by device» 100 GB/s for RAM» 2 GB/s for NVMe SSD» 130 MB/s for hard disk
Assumes large reads (≫1 block)!
CS 245 26
![Page 26: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/26.jpg)
Storage Cost
$1000 at NewEgg today buys:» 0.2 TB of RAM» 9 TB of NVMe SSD» 33 TB of magnetic disk
CS 245 27
![Page 27: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/27.jpg)
Hardware Trends over Time
Capacity/$ grows exponentially at a fast rate (e.g. double every 2 years)
Throughput grows at a slower rate (e.g. 5% per year), but new interconnects help
Latency does not improve much over time
CS 245 28
![Page 28: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/28.jpg)
Terms: Platter, Head, ActuatorCylinder, TrackSector (physical),Block (logical), Gap
…
Most Common Permanent Storage: Hard Disks
CS 245 30
![Page 29: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/29.jpg)
Top View
CS 245 31
![Page 30: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/30.jpg)
block xin memory
?
I wantblock X
Disk Access Time
CS 245 32
![Page 31: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/31.jpg)
Time = Seek Time +Rotational Delay +Transfer Time +Other
Disk Access Time
CS 245 33
![Page 32: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/32.jpg)
3-5X
X
1 N
Cylinders Traveled
Time
CS 245 34
Seek Time
![Page 33: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/33.jpg)
Typical Seek TimeRanges from» 4 ms for high end drives» 15 ms for mobile devices
In contrast, SSD access time ranges from» 0.02 ms: NVMe» 0.16 ms: SATA
CS 245 35
![Page 34: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/34.jpg)
Head Here
Block I Want
CS 245 36
Rotational Delay
![Page 35: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/35.jpg)
R = 1/2 revolution
HDDSpindle[rpm]
Averagerotational
latency [ms]
4,200 7.14
5,400 5.56
7,200 4.17
10,000 3.00
15,000 2.00
Typical HDD figures
Source: Wikipedia, "Hard disk drive performance characteristics"
R=0 for SSDs
CS 245 37
Average Rotational Delay
![Page 36: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/36.jpg)
Transfer rate T is around 50-130 MB/s
Transfer time: size / T for contiguous read
Block size: usually 512-4096 bytes
CS 245 38
Transfer Rate
![Page 37: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/37.jpg)
So Far: Random Block Access
What about reading the “next” block?
CS 245 39
![Page 38: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/38.jpg)
If we do things right (i.e., Double Buffer, Stagger Blocks…)
Time to get = block size / t + negligible
Potential slowdowns:» Skip gap» Next track» Discontinuous block placement
CS 245 40
Sequential access generally much faster than random access
![Page 39: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/39.jpg)
…. unless we want to verify!need to add (full) rotation + block size / t
CS 245 41
Cost of Writing: Similar to Reading
![Page 40: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/40.jpg)
To Modify Block:(a) Read Block(b) Modify in Memory(c) Write Block[(d) Verify?]
CS 245 42
Cost To Modify a Block?
![Page 41: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/41.jpg)
Performance of DRAM
The same basic issues with “lookup time” vs throughput apply to DRAM
Min read from DRAM is a cache line (64 bytes)
Even 64-byte random reads may not be as fast as sequential ones due to prefetching, page table, controllers, etc
CS 245 43
Place co-accessed data together!
![Page 42: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/42.jpg)
Example
Suppose we’re accessing 8-byte records in a
DRAM with 64-byte cache line sizes
How much slower is random vs sequential?
CS 24544
In the random case, we are reading 64 bytes
for every 8 bytes we need, so we expect to
max out the throughput at least 8x sooner.
![Page 43: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/43.jpg)
Storage Hierarchy
Typically want to cache frequently accessed data at a high level of the storage hierarchy to improve performance
CS 245 45
CPU
CPU Cache
DRAM
Disk
(KBs-MBs)
(GBs)
(TBs)
![Page 44: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/44.jpg)
Sizing Storage Tiers
How much high-tier storage should we have?
Can determine based on workload & cost
CS 245 46
The 5 Minute Rule for Trading MemoryAccesses for Disc AccessesJim Gray & Franco PutzoluMay 1985
![Page 45: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/45.jpg)
The Five Minute RuleSay a page is accessed every X seconds
Assume a disk costs D dollars and can do Ioperations/sec; cost of keeping this page on disk is
Cdisk = Ciop / X = D / (I X)
Assume 1 MB of RAM costs M dollars and holds Ppages; then the cost of keeping it in DRAM is:
Cmem = M / P
CS 245 47
![Page 46: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/46.jpg)
Five Minute Rule
This tells us that the page is worth caching when Cmem < Cdisk, i.e.
CS 245 48
X <
Source: The Five-minute Rule Thirty Years Later and its Impact on the Storage Hierarchy
![Page 47: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/47.jpg)
Disk ArraysMany flavors of “RAID”: striping, mirroring, etcto increase performance and reliability
logically one disk
CS 245 49
![Page 48: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/48.jpg)
Common RAID Levels
CS 245 50Image source: Wikipedia
Striping across2 disks: addsperformance butnot reliability
Mirroring across2 disks: addsreliability but notperformance
Striping + 1 parity disk: addsperformance and reliability atlower storage cost
![Page 49: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/49.jpg)
Coping with Disk Failures
Detection» E.g. checksum
Correction» Requires redundancy
CS 245 51
![Page 50: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/50.jpg)
Single Disk» E.g., error-correcting codes on read
Disk Array
Logical Physical
CS 245 52
At What Level Do We Cope?
![Page 51: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/51.jpg)
Logical Block Copy A Copy B
CS 245 53
Operating System
E.g., network-replicated storage
![Page 52: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/52.jpg)
Database System
E.g.,
Log
Current DB Last week’s DB
CS 245 54
![Page 53: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/53.jpg)
SummaryStorage devices offer various tradeoffs in terms of latency, throughput and cost
In all cases, data layout and access pattern matter because random ≪ sequential access
Most systems will combine multiple devices
CS 245 55
![Page 54: Database Architecture 2 & Storageweb.stanford.edu/class/cs245/spr2019/slides/03-System... · 2019-12-21 · R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4,200](https://reader033.fdocuments.in/reader033/viewer/2022042222/5ec916bf233920076327a2e1/html5/thumbnails/54.jpg)
Assignment 1
Explores the effect of data layout for a simple in-memory database» Fixed set of supported queries» Implement a row store, column store,
indexed store, and your own custom store!
CS 245 56
Will be posted soon on website!