Contents
1/31/2017 2
• Overview
• Buffer Pool
• Buffer Read
• LRU Replacement
• Flusher
• Doublewrite Buffer
• Synchronization
OVERVIEW
1/31/2017 3
Overview
1/31/2017 4
• Buffer Pool
– Considering memory hierarchy
– Caching frequently accessed data into DRAM like a cache memory in CPU
– Exploit locality
Overview
• InnoDB Architecture
1/31/2017 5
Handler API
Transaction (trx)
Logging & Crash
Recovery(log & recv)
Mini Transaction
(mtr)
Lock(lck)
Cursor (cur)
Row (row)
B-tree (btr)
Page (page)
Buffer Manager (buf)
Free space / File Management (fsp / fil)
IO
Overview
• Buffer Manager– Buffer Pool (buf0buf.cc) : buffer pool manager
– Buffer Read (buf0read.cc) : read buffer
– LRU (buf0lru.cc) : buffer replacement
1/31/2017 6
Overview
• Buffer Manager– Flusher (buf0flu.cc) : dirty page writer & background flusher
– Doublewrite (buf0dblwr.cc) : doublewrite buffer
1/31/2017 7
BUFFER POOL
1/31/2017 8
Lists of Buffer Blocks
• Free list
– Contains free page frames
• LRU list
– Contains all the blocks holding a file page
• Flush list
– Contains the blocks holding file pages that have been modified in the memory but not written to disk yet
1/31/2017 9
DatabaseBuffer
TailHead D D D
Main LRU list
Free list
D
Flush list D D
Buffer Pool Mutex
• The buffer buf_pool contains a single mutex
– buf_pool->mutex: protects all the control data structures of the buf_pool
• The buf_pool->mutex is a hot-spot in main memory
– Causing a lot of memory bus traffic on multiprocessor systems when processors alternatively access the mutex
– A solution to reduce mutex contention• To create a separate lock for the page hash table
1/31/2017 10
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 11
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 12
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 13
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 14
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init()
1/31/2017 15
...
Buffer pool init per instance
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init_instance()
1/31/2017 16
Create mutex for buffer pool
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init_instance()
1/31/2017 17
Initialize buffer chunk
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init_instance()
1/31/2017 18
Create page hash table
Buffer Chunk
• Total buffer pool size
= x * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size
1/31/2017 19
Buffer Chunk Struct
• include/buf0buf.ic: buf_chunk_t
1/31/2017 20
[0]
[…]
[N]
[0] [1] [3][2]
[…] […] [N][…]blocks
frames
Buffer chunk mem
• buf/buf0buf.cc: buf_chunk_init()
Buffer Pool Init
1/31/2017 21
Allocate chunk mem(blocks + frames)
• buf/buf0buf.cc: buf_chunk_init()
Buffer Pool Init
1/31/2017 22
Allocate control blocks
Allocate frames(Page size is aligned)
• buf/buf0buf.cc: buf_chunk_init()
Buffer Pool Init
1/31/2017 23
Add all blocks to free list
Initialize control block
Buffer Control Block (BCB)
• The control block contains
– Read-write lock
– Buffer fix count• Which is incremented when a thread wants a file page to be fixed in a buffer
frame
• The buffer fix operation does not lock the contents of the frame
– Page frame
– File pages• Put to a hash table according to the file address of the page
1/31/2017 24
Buffer Control Block (BCB)
1/31/2017 25
Page
Frame ptr
Mutex
RW lock
Lock hash value
…
buf_block_t
Table space ID
Page offset
Buffer fix count
IO fix
State
Hash
List (LRU, free, flush)
…
buf_page_t
• include/buf0buf.h: buf_block_t
BCB Struct
1/31/2017 26
• include/buf0buf.h: buf_block_t
BCB Struct
1/31/2017 27
• include/buf0buf.h: buf_page_t
BCB Struct
1/31/2017 28
Page identification
• include/buf0buf.h: buf_page_t
BCB Struct
1/31/2017 29
...
...
• include/buf0buf.h: buf_page_t
BCB Struct
1/31/2017 30
• buf/buf0buf.cc: buf_block_init()
BCB Init
1/31/2017 31
Set data frame
... Create block mutex & rw lock
BUFFER READPART 1 : READ A PAGE
1/31/2017 32
MySQL Buffer Manager: Read
Database on Flash SSD
DatabaseBuffer
3. Read a page
Tail
1. Search free list
Head D D D
Main LRU List
Free list
Dirty Page Set
D D
Scan LRU List from tail
Double Write Buffer
2. Flush Dirty Pages
D
Buffer Read
• Read a page (buf0rea.cc)
– Find a certain page in the buffer pool using hash table
– If it is not in the buffer pool, then read a block from the storage• Allocate a free block for read (include buffer block)
• Two cases
– Buffer pool has free blocks
– Buffer pool doesn’t have a free block
• Read a page
1/31/2017 34
Buffer Read
1/31/2017 35
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 36
Get the buffer pool ptr using space & offset
** 2 important things **1) ID of a page is (space, offset) of the page2) Buffer pool – page mapping is mapped
Buffer Read
• include/buf0buf.ic: buf_pool_get()
1/31/2017 37
Make a fold number
Buffer Read
• Why fold?
– They want to put pages together in the same buffer pool if it is the same extents for read ahead
1/31/2017 38
Page 0
Page 1
…
Page 63
Page 64
Page 65
…
Page 127
Fold 0 Fold 1
…
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 39
Get page hash lock before searching in the hash table
Set shared lock on hash table
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 40
...
...
Find a page in the hash table
Page doesn’t exist in buffer pool
Read the page from the storage
retry
success
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 41
Fail
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 42
If it failed to read target page, go to the first part of the loop
Buffer Read
• buf/buf0rea.cc: buf_read_page()
1/31/2017 43
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 44
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 45
Allocate buffer block for read
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 46
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 47
Get free block: see this later
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 48
Initialize buffer page for current read
Buffer Read
• buf/buf0buf.cc: buf_page_init()
1/31/2017 49
Buffer Read
• buf/buf0buf.cc: buf_page_init()
1/31/2017 50
Buffer Read
• buf/buf0buf.cc: buf_page_init()
1/31/2017 51
Insert a page into hash table
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 52
Add current block to LRU list: see this later
Set io fix to BUF_IO_READ
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 53
Increase pending read count;How many buffer read were requested but not finished
Buffer Read
• We allocate a free buffer and control block
• And the block was inserted into hash table and LRU list of corresponding buffer pool
• Now, we need to read the real content of the target page from the storage
1/31/2017 54
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 55
Read a page from storage: see this later
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 56
Complete the read request
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 57
Get io type (In this case, BUF_IO_READ)
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 58
Get io type (In this case, BUF_IO_READ)
...
Page corruption check based on checksum in the page
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 59
Set io fix to BUF_IO_NONE
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 60
Decrease pending read count
BUFFER READPART 2 : AFTER GOT BLOCK
1/31/2017 61
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 62
...
Set access time
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 63
Do read ahead process (default=false)
LRU REPLACEMENTPART 1 : ADD BLOCK
1/31/2017 64
LRU Add Block
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 65
Add current block to LRU list
LRU Add Block
• buf/buf0lru.cc: buf_LRU_add_block()
1/31/2017 66
LRU Add Block
• buf/buf0lru.cc: buf_LRU_add_block_low()
1/31/2017 67
If list is too small, then put current block to first of the list
LRU Add Block
• buf/buf0lru.cc: buf_LRU_add_block_low()
1/31/2017 68
Else, insert current block to after LRU_old pointer
LRU REPLACEMENTPART 2 : GET FREE BLOCK
1/31/2017 69
LRU Get Free Block
• This function is called from a user thread when it needs a clean block to read in a page
– Note that we only ever get a block from the free list
– Even when we flush a page or find a page in LRU scan we put it to free list to be used
1/31/2017 70
LRU Get Free Block
• iteration 0:
– get a block from free list, success: done
– if there is an LRU flush batch in progress:• wait for batch to end: retry free list
– if buf_pool->try_LRU_scan is set• scan LRU up to srv_LRU_scan_depth to find a clean block
• the above will put the block on free list
• success: retry the free list
– flush one dirty page from tail of LRU to disk (= single page flush)• the above will put the block on free list
• success: retry the free list
1/31/2017 71
LRU Get Free Block
• iteration 1:
– same as iteration 0 except:• scan whole LRU list
• iteration > 1:
– same as iteration 1 but sleep 100ms
1/31/2017 72
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 73
Get buffer pool mutex
Get a free block from the free list
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 74
Getting a free block succeeded
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 75
If already background flushed started, wait for it to end
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 76
Find a victim page to replace and make a free block
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_scan_and_free_block()
1/31/2017 77
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_scan_and_free_block()
1/31/2017 78
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_free_from_common_LRU_list()
1/31/2017 79
Try to free it
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_free_page()
1/31/2017 80
Get hash lock & block mutex
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_free_page()
1/31/2017 81
If current page is dirty and not flushed to disk yet, exit
• buf/buf0lru.cc: buf_LRU_free_page()– After func_exit, we are on clean case!
LRU Get Free Block
1/31/2017 82
...
...
...
• buf/buf0lru.cc: buf_LRU_block_remove_hashed()
LRU Get Free Block
1/31/2017 83
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 84
...
If we have free block(s), go to loop
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 85
If we failed to make a free block, do a single page flush
FLUSH A PAGEPART 1 : SINGLE PAGE FLUSH
1/31/2017 86
Flush a Page
• Flushing a page by
– A background flusher batch flush (LRU & flush list)
– A single page flush in LRU_get_free_block()
• Background flusher
– Regularly check system status (per 1000ms)
– Flush all buffer pool instances in a batch manner
1/31/2017 87
Single Page Flush
1/31/2017 88
buf_LRU_get_free_block()
buf_flush_single_page_from_LRU()
buf_flush_ready_for_flush() buf_flush
_page()buf_flush_write_block_low()
buf_dblwr_write_single_page()
fil_io()
fil_flush()
buf_page_io_complete()
• buf/buf0flu.cc: buf_flush_single_page_from_LRU()
Single Page Flush
1/31/2017 89
Full scan
• buf/buf0flu.cc: buf_flush_single_page_from_LRU()
Single Page Flush
1/31/2017 90
Check whether we can flush current block and ready for flush
Try to flush it; write to disk
...
• buf/buf0flu.cc: buf_flush_ready_for_flush()
Single Page Flush
1/31/2017 91
If the page is already flushed or doing IO, return false
...
• buf/buf0flu.cc: buf_flush_page()
– Writes a flushable page from the buffer pool to a file
Single Page Flush
1/31/2017 92
Get lock
...
• buf/buf0flu.cc: buf_flush_page()
Single Page Flush
1/31/2017 93
Set fix and flush type
...
...
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 94
Flush log (transaction log – WAL)
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 95
Doublewrite off case
Write the page to dwb, then write to datafile;See this in dwb part
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 96
Sync buffered write to disk; call fsync by fil_flush()
FLUSH A PAGEPART 2 : BATCH FLUSH
1/31/2017 97
Batch Flush
• Background flusher (= page cleaner thread)
– Independent thread for flushing a dirty pages from buffer pools to storage
– Regularly (per 1000ms) do flush from LRU tail or
– Do flush by dirty page percent (configurable)
• Thread definition
– buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)
1/31/2017 98
...
• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)
Background Flusher
1/31/2017 99
Run until shutdown
• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)
Background Flusher
1/31/2017 100
Nothing hasbeen changed
Something hasbeen changed!
• buf/buf0flu.cc: buf_flush_LRU_tail()
• Clears up tail of the LRU lists:
– Put replaceable pages at the tail of LRU to the free list
– Flush dirty pages at the tail of LRU to the disk
• srv_LRU_scan_depth
– Scan each buffer pool at this amount
– Configuable: innodb_LRU_scan_depth
LRU List Batch Flush
1/31/2017 101
LRU List Batch Flush
1/31/2017 102
buf_flush_LRU_tail()
buf_flush_LRU()
buf_flush_batch()
buf_do_LRU_batch()
buf_flush_LRU_list_batch()
buf_LRU_free_page()
fil_io()
fil_flush()
buf_page_io_complete()
buf_flush_page_and_try_neighbors()
buf_flush_page()
buf_flush_write_block_low()
buf_dblwr_add_to_batch()
buf_dblwr_flush_
buffered_writes()
• buf/buf0flu.cc: buf_flush_LRU_tail()
LRU List Batch Flush
1/31/2017 103
Per buffer pool instance
• buf/buf0flu.cc: buf_flush_LRU_tail()
LRU List Batch Flush
1/31/2017 104
Chunk size = 100
• buf/buf0flu.cc: buf_flush_LRU()
LRU List Batch Flush
1/31/2017 105
...
Batch LRU flush
• buf/buf0flu.cc: buf_flush_batch()
LRU List Batch Flush
1/31/2017 106
Do LRU batch
• buf/buf0flu.cc: buf_do_LRU_batch()
LRU List Batch Flush
1/31/2017 107
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 108
Get the last page from LRU
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 109
• buf/buf0flu.cc: buf_flush_ready_for_replace()
LRU List Batch Flush
1/31/2017 110
Check whether current page is clean page or not
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 111
It there is any replaceable page, free the page
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 112
Else, try to flush neighbor pages
• buf/buf0flu.cc: buf_flush_page_and_try_neighbors()
LRU List Batch Flush
1/31/2017 113
Flush page, but no sync
• buf/buf0flu.cc: buf_flush_write_block_low()
LRU List Batch Flush
1/31/2017 114
Doublewrite off case
Add the page to the dwb buffer;See this in dwb part
LRU List Batch Flush
• Now, victim pages are gathered for replacement
• We need to flush them to disk
• We can do this by calling buf_flush_common()
1/31/2017 115
• buf/buf0flu.cc: buf_flush_LRU()
LRU List Batch Flush
1/31/2017 116
...
• buf/buf0flu.cc: buf_flush_common()
LRU List Batch Flush
1/31/2017 117
• flush all pages we gathered so far• write the pages to dwb area first• then issue it to datafile; See this later
DOUBLEWRITE BUFFERPART 1 : ARCHITECTURE
1/31/2017 118
Double Write Buffer
• To avoid torn page (partial page) written problem
• Write dirty pages to special storage area in system tablespace priori to write database file
1/31/2017 119
Database on Flash SSD
DatabaseBuffer
TailHead D D D
Main LRU List
Free list
Dirty Page Set
D D
Scan LRU List from tail
D
Double Write Buffer
ibdata1-space0datafiles
DWB Architecture
1/31/2017 120
• include/buf0dblwr.cc: buf_dblwr_t
DWB Struct
1/31/2017 121
• include/buf0dblwr.cc: buf_dblwr_t
DWB Struct
1/31/2017 122
DOUBLEWRITE BUFFERPART 2 : SINGLE PAGE FLUSH
1/31/2017 123
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 124
...
• buf/buf0dblwr.cc: buf_dblwr_write_single_page()
Single Page Flush
1/31/2017 125
# of slots for single page flush= 2 * DOUBLEWRITE_BLOCK_SIZE – BATCH_SIZE= 128 – 120= 8
• buf/buf0dblwr.cc: buf_dblwr_write_single_page()
Single Page Flush
1/31/2017 126
If all slots are reserved, wait until current
dblwr done
Find a free slot
• buf/buf0dblwr.cc: buf_dblwr_write_single_page()
Single Page Flush
1/31/2017 127
Write to datafile(synchronous)
Sync system tablespace for dwb area in disk
Write block to dwb area in disk(synchronous)
...
• buf/buf0dblwr.cc: buf_dblwr_write_block_to_datafile()
Single Page Flush
1/31/2017 128
Issue write operation to datafile
In single page flush, sync = true
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 129
Sync buffered write to disk; call fsync by fil_flush()
Single Page Flush
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 130
Get io type (In this case, BUF_IO_WRITE)
Single Page Flush
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 131
Set io fix to BUF_IO_NONE
Single Page Flush
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 132
Single Page Flush
• buf/buf0flu.cc: buf_flush_write_complete()
1/31/2017 133
Remove the block from the flush list
…
Single Page Flush
• buf/buf0dblwr.cc: buf_dblwr_update()
1/31/2017 134
Free the dwb slot of the target page
Single Page Flush
• Single page flush is performed in the context of the query thread itself
• Single page flush mode iterates over the LRU list of a buffer pool instance, while holding the buffer pool mutex
• It might have trouble in getting a free doublewrite buffer slot (total 8 slots)
• In result, it makes the overall performance worse
1/31/2017 135
DOUBLEWRITE BUFFERPART 3 : BATCH FLUSH
1/31/2017 136
• buf/buf0flu.cc: buf_flush_write_block_low()
Batch Flush
1/31/2017 137
• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()
Batch Flush
1/31/2017 138
If another batch is already running, wait until done
• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()
Batch Flush
1/31/2017 139
If all slots for batch flush in dwb buffer is reserved,
flush dwb buffer
• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()
Batch Flush
1/31/2017 140
After flushing, copy current block to buf_dblwr->first_free
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 141
Doublewrite off case
• buf/buf0dblwr.cc: buf_dblwr_sync_datafiles()
Batch Flush
1/31/2017 142
fil_flush() os_file_flush() os_file_fsync() fsync()
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 143
Change batch running status
Exit mutex
Nobody won’t be here except me!
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 144
Issue write op for block1(synchronous)
If current write uses only block1, then flush
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 145
Issue write op for block2(synchronous)
...
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 146
Flush (fsync) system table space
Write all blocks to datafile(asynchronous)
• After submitting aio requests,
– fil_aio_wait()
– buf_page_io_complete()
– buf_flush_write_complete()
– buf_dblwr_update()
Batch Flush
1/31/2017 147
• buf/buf0dblwr.cc: buf_dblwr_update()
Batch Flush
1/31/2017 148
Flush datafile
Reset dwb
SYNCHRONIZATION
1/31/2017 149
InnoDB Synchronization
• InnoDB implements its own mutexes & RW-locks for buffer management
• Latch in InnoDB
– A lightweight structure used by InnoDB to implement a lock
– Typically held for a brief time (milliseconds or microseconds)
– A general term that includes both mutexes (for exclusive access) and rw-locks (for shared access)
1/31/2017 150
Mutex in InnoDB
• The low-level object to represent and enforce exclusive-access locks to internal in-memory data structures
• Once the lock is acquired, any other process, thread, and so on is prevented from acquiring the same lock
1/31/2017 151
Mutex in InnoDB
1/31/2017 152
• Example code in InnoDB
– buf/buf0buf.cc: buf_wait_for_read()
Get current IO fix of the block
Mutex in InnoDB
1/31/2017 153
• Example code in InnoDB
– buf/buf0flu.cc: buf_flush_batch()
Flush the pages in the buffer pool
RW-lock in InnoDB
• The low-level object to represent and enforce shared-access locks to internal in-memory data structures
• RW-lock includes three types of locks
– S-locks (shared locks)
– X-locks (exclusive locks)
– SX-locks (shared-exclusive locks)
1/31/2017 154
S SX X
S Compatible Compatible Conflict
SX Compatible Conflict Conflict
X Conflict Conflict Conflict
RW-lock in InnoDB
• S-lock (Shared-lock) – provides read access to a common resource
• X-lock (eXclusive-lock)– provides write access to a common resource
– while not permitting inconsistent reads by other threads
• SX-lock (Shared-eXclusive lock)– provides write access to a common resource
– while permitting inconsistent reads by other threads
– introduced in MySQL 5.7 to optimize concurrency and improve scalability for read-write workloads.
1/31/2017 155
RW-lock in InnoDB
1/31/2017 156
• Example code in InnoDB (S-lock)
– buf/buf0buf.cc: buf_page_get_gen()
…
Search hash table
RW-lock in InnoDB
1/31/2017 157
• Example code in InnoDB (X-lock)
– buf/buf0buf.cc: buf_page_init_for_read()
…
Insert a page into the hash table
Top Related