Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

56
Dalí Main-memory Storage Manager Tomasz Piech

Transcript of Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Page 1: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Dalí Main-memory Storage Manager

Tomasz Piech

Page 2: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Salvador Dalí - Persistence of Memory (1931)

Page 3: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Introduction

• Dalí– Implemented at Bell Laboratories– Storage manager for persistent data– Architecture optimized for databases resident in

main memory– Application – real-time billing and control of

multimedia content deliver• High transaction rates, low latency

Page 4: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Introduction

• Dalí Techniques– Direct access to data – direct pointers to

information stored in dbase – high performance– No interprocess communication –

communication with server only during dis/connection; concurrency, logging provided via shared memory

– Fault-tolerant – advanced, multi-level transaction model; high concurrency indexing and storage

Page 5: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Introduction

• Dalí– Recovery from process failure in addition to

system failure– Use of codewords and memory protection –

integrity of data (discussed later)– Consistency of response time – key requirement

for applications with memory-resident data– Designed for databases that fit into main

memory (virtual will work but not as well)

Page 6: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Overview of Presentation

• Architecture

• Storage

• Transaction Management

• Fault Tolerance

• Concurrency Control

• Collections and Indexing

• Higher Level Interfaces

Page 7: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Architecture

Page 8: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Architecture

• Database files – user data, one or more exist in database

• System database files – database support related data, such as locks and logs

• Files opened by a process are directly mapped into its address space

• mmap files or shared-memory segments used to provide mapping

Page 9: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Layers of Abstraction

Dalí architecture is organized to support the toolkit approach

Page 10: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Layers of Abstraction

• Toolkit approach– Logging can be turned off for data which need

not be persistent– Locking can be turned off if data is private to a

process

• Multiple interface levels– Low-level components are exposed to user for

optimization

Page 11: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Storage

Page 12: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Pointers and Offsets

• Each process has a database-offset table– Specifies where in memory a file is mapped

– Implemented as an array indexed by file id

• Primary Dalí pointer (p)– Dbase file local-identifier & offset within file

– To dereference, add offset from p to virtual memory address from offset table

• Secondary pointer– Index in one file, store just the offset since location of

file is known

Page 13: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Storage Allocation

• Motivation– Control data should be stored separately from

user data• protection of control data from stray pointers

– Indirection should not exist at the lowest level• Indirection adds a level of latching for each data

access & increases path length for dereferecing itself• Dalí exposes direct pointers to allocated data,

provides time and space efficiency

Page 14: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Storage Allocation

• Motivation– Large objects should be stored contiguously

• Advantage is speed; recreating a file from smaller files takes away that advantage

– Different recovery characteristics should be available for different regions of the database

• Not all data needs to be recovered from a crash

• Indexes can be rebuilt, etc.

Page 15: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Storage Allocation

– Two levels of non-recovered data• Zeroed memory – remains allocated but is zeroed

• Transient memory – data no longer allocated upon recovery

Page 16: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Segments and Chunks

• Segment– contiguous page-aligned unit of allocation;

arbitrarily large; database files are comprised of segments

• Chunk– A collection of segments

Page 17: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Segments and Chunks

Page 18: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Segments and Chunks

• Allocators– Return standard Dalí pointers to allocated space

within a chunk; indirection not imposed at storage manager level

– No record of allocated space is retained

• 3 different allocators– Power-of-two – allocates buckets of size 2i*m– Inline power-of-two – as above + free space list

uses 1st few bytes of each free block

Page 19: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Segments and Chunks

• Allocators (cont’d)– Coalescing allocator – merges adjacent free

space & uses a free tree– Power of 2 inline faster but neither coalesces

adjacent free space – fragmentation (thus fixed size records only)

– Coalescing uses free tree – based on T-tree – to keep track of free space; logarithmic time for allocation and freeing

Page 20: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Page Table & Segment Headers

• Segment header – associate info about a segment/chunk with a physical pointer– Allocated when segment is added to a chunk– Can store additional info about data in segment

• Page table – maps pages to segment headers– Pre-allocated based on max # of pages in dbase

Page 21: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Transaction Management

Recovery

System Overview

Checkpointing

Page 22: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Transaction Management in Dalí

• Transaction atomicity, isolation & durability in Dalí

• Regions - logically organized data– A tuple, an object or arbitrary data structure (a

tree or a list)

• Region lock - X or S lock that guards access/updates to a region

Page 23: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Multi-Level Recovery

• Permits use of weaker operation locks in place of X/S region locks

• Example, index management– An update to index structure (i.e. Insert)– Physical undo description must be valid until

transaction commit• Unacceptable level of concurrency

Page 24: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Multi-level Recovery

– Replace low-level physical undo log records with higher-level logical undo log records (description at operation level)

– Insert – logical-undo record replaces physical-undo record by specifying that the inserted key must be deleted

– Region locks can be released and less restrictive operation locks persist higher level of concurrency

Page 25: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Multi-level Recovery

• An example of find and insert ?

• Releasing region locks would allow updates on the same region– Cascading aborts - rolling back the first

operation would damage effects of later actions– Only compensating undo operation can be used

to undo the operation

Page 26: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Multi-level Recovery Example

Page 27: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

System Overview

• Stored on disk:– Two checkpoint images Ckpt_A & Ckpt_B– cur_ckpt – anchor to the most recent valid

checkpoint image for database– Single system log containing redo information,

its tail in memory

• end_of_stable_log – pointer; all records prior to it were flushed to stable system log

Page 28: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

System Overview

Page 29: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

System Overview

• Stored in the system database & with each checkpoint– Active Transaction Table (ATT)

• Stores separate redo & undo logs for each active transaction

– dpt – dirty page table; stores pages updated since the last checkpoint

– ckpt_dpt – dpt in a checkpoint

Page 30: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Transactions and Operations

• Transaction – a list of operations– Each op. has a level Li associate with it

– Op at level Li is can consist of ops of level Li-1

– L0 are physical updates to regions

– Pre-commit – the commit record enters the system log in memory

– Commit - commit record hits the stable storage

Page 31: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Logging Model

– Updates generate physical undo and redo log records appended to Tx’s undo & redo logs (in ATT)

– When Tx pre-commits, redo appended to system log, and logical-undo included in operation commit log in system log

– When operation pre-commits, undo log records are deleted for its sub-operations/updates from Tx’s undo log & this operation’s logical undo appended to Tx’s undo log

Page 32: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Logging Model

– Locks released once Tx/operation pre-commits– System log flushed to disk when Tx commits– Dirty pages are marked in the dpt by he

flushing procedure – no page latching

Page 33: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Ping-pong Checkpointing

– Traditionally, systems implement WAL for recovery – it is impossible to enforce WAL without latches

– Latches increase access cost in main memory & interfere with normal processing

– Solution, store two copies of dbase image on disk; dirty pages written to alternate checkpoints

– Fuzzy checkpointing – no latches used, no interference with normal operations

Page 34: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Ping-pong Checkpointing

• Checkpoints are allowed to be temporarily inconsistent – updates written out without undo records

• Redo and undo info from ATT is written out to a checkpoint and brings it to a consistent state

• If failure occurs, the other checkpoint is still consistent and can be used for recovery

Page 35: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Ping-pong Checkpointing

• Log flush necessary at end of checkpointing before toggling cur_ckpt – commit might take place before writing out ATT, leaving no undo information if system crashes

Page 36: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Abort Processing

• Upon abort, undo log records undone by sequentially traversing undo log from end

• New physical-redo log record created for every physical-undo encountered

• Similarly, for logical-undo “compensation” operation is executed (“proxy)

• All undo log records deleted when proxy commits

Page 37: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Abort Processing

• Commit record for proxy is similar to compenstation log records (CLRs) in ARIES

• During recovery, logical-undo log record deleted from Tx’s undo log if a CLR encountered, preventing Tx from being undone gagin

Page 38: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Recovery

• end_of_stable_log is where recovery begins• Initializes ATT and undo logs with copies

from last checkpoint• Loads database image and sets dpt to zero• Applies all redo log following begin-

recovery-point• Then all active transactions are rolled back

– First all completed L0 operations must be rolled back then L1, then L2 and so on.

Page 39: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Post-commit Operations

• Operations guaranteed to be carried out after commit of a transaction/operation even if the system crashes

• Some operations cannot be rolled back once performed (deletion then allocation of same space to different operation)

• Need to ensure high concurrency on storage allocator – cannot hold locks

• Solution – perform these operations after transaction commits (keep post-commit log)

Page 40: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Fault Tolerance

Process Death and Its Detection

Page 41: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Fault Tolerance

• Techniques that help cope with process failure scenarios

Page 42: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Process Death

• Caused by an attempt to access invalid memory, or by an operator kill

• Must return shared data partially updated to consistent state

• Abort any uncommitted transactions owned by that process

• Cleanup server is primarily responsible for cleaning up dead processes

Page 43: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Process Death

• Active Process Table (APT) – keeps track of all processes in the system; scanned periodically to check if any are dead

• Low-level clean up– Process registers with APT any latch acquired– If latch held by dead process clean up function

for that latch is called– If not possible to clean up latch then simulate

system crash

Page 44: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Process Death

• Cleaning up Transactions– Clean-up agent – scan Tx table and abort any

Tx running on behalf of the dead process or execute post-commit actions for committed Tx

– Multiple clean up agents spawn if multiple processes have died

Page 45: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Protection from Application Errors

• Memory protection– munprotect called right before an update to a

page and mprotect after Tx commits to protect pages

• Codewords– associate logical parity word with each page of

data– Erroneous writes will update only physical data

not codeword – crash simulated if error found

Page 46: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Concurrency Control

Implementation of Latches

Page 47: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Concurrency Control

• Concurrency control facilities:– Latches (low-level locks for mutual exclusion)– Queuing locks

• Latch Implementation– Semaphores too expensive – system call overhead– Implementation must complement cleanup server

Page 48: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Latch Implementation

Page 49: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Latch Implementation

• Processes that wish to acquire a latch keep a pointer to that latch in their wants field

• cleanup-in-progress flag forbids processes to attempt to get a latch is set to True

• Cleanup server waits for process to set their wants fields to null or another lock or to die

• If a dead process is a registered owner of the latch, cleanup function is called

Page 50: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Locking System

• Lock header structure– Stores a pointer to a list of locks that have been

requested (but not released) by transactions– Request times out if not granted in a certain

amount of time

• Add new lock modes with the use of conflicts and covers– covers – holder of lock A checks for conflicts

when requesting new lock of type B, unless A covers B

Page 51: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Collections and Indexing

Heap Files

Extendible Hashing

Page 52: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Collections and Indexing

• Dalí provides higher level interfaces for grouping related data items & performing scans & associative access on items in group

• Heap file– abstraction for handling a large number of fixed-

length data items– Scans are supported through bitmaps in segment

header– Entries deleted from heap are 0 in the bitmap– Bitmap mirrors allocator’s free list

Page 53: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Collections and Indexing

• Extendible hashing– Similar to what was covered in CS 432– Utilization factor – determines when to double

the directory; more tolerant than bucket overflow trigger; avoids space problems/util.

Page 54: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Extendible Hashing

Page 55: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

T-tree indexes

• Briefly: internal nodes, semi-leaf & leaf nodes

• To search for value, at each node check if key is bounded by left and right-most key values. If so, check if key value returned if contained in the node; otherwise traverse tree further down

Page 56: Dalí Main-memory Storage Manager Tomasz Piech. Salvador Dalí - Persistence of Memory (1931)

Higher Level Interfaces

• Two database management systems built on Dalí– Dalí Relational Manager– Main Memory – ODE Object Oriented Database