MySQL Buffer Management

Post on 24-Jan-2018

369 views 7 download

Transcript of MySQL Buffer Management

1/31/2017 1

MySQL Buffer Management

Mijin Ahnmeeeeejin@gmail.com

Contents

1/31/2017 2

• Overview

• Buffer Pool

• Buffer Read

• LRU Replacement

• Flusher

• Doublewrite Buffer

• Synchronization

OVERVIEW

1/31/2017 3

Overview

1/31/2017 4

• Buffer Pool

– Considering memory hierarchy

– Caching frequently accessed data into DRAM like a cache memory in CPU

– Exploit locality

Overview

• InnoDB Architecture

1/31/2017 5

Handler API

Transaction (trx)

Logging & Crash

Recovery(log & recv)

Mini Transaction

(mtr)

Lock(lck)

Cursor (cur)

Row (row)

B-tree (btr)

Page (page)

Buffer Manager (buf)

Free space / File Management (fsp / fil)

IO

Overview

• Buffer Manager– Buffer Pool (buf0buf.cc) : buffer pool manager

– Buffer Read (buf0read.cc) : read buffer

– LRU (buf0lru.cc) : buffer replacement

1/31/2017 6

Overview

• Buffer Manager– Flusher (buf0flu.cc) : dirty page writer & background flusher

– Doublewrite (buf0dblwr.cc) : doublewrite buffer

1/31/2017 7

BUFFER POOL

1/31/2017 8

Lists of Buffer Blocks

• Free list

– Contains free page frames

• LRU list

– Contains all the blocks holding a file page

• Flush list

– Contains the blocks holding file pages that have been modified in the memory but not written to disk yet

1/31/2017 9

DatabaseBuffer

TailHead D D D

Main LRU list

Free list

D

Flush list D D

Buffer Pool Mutex

• The buffer buf_pool contains a single mutex

– buf_pool->mutex: protects all the control data structures of the buf_pool

• The buf_pool->mutex is a hot-spot in main memory

– Causing a lot of memory bus traffic on multiprocessor systems when processors alternatively access the mutex

– A solution to reduce mutex contention• To create a separate lock for the page hash table

1/31/2017 10

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 11

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 12

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 13

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 14

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init()

1/31/2017 15

...

Buffer pool init per instance

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init_instance()

1/31/2017 16

Create mutex for buffer pool

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init_instance()

1/31/2017 17

Initialize buffer chunk

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init_instance()

1/31/2017 18

Create page hash table

Buffer Chunk

• Total buffer pool size

= x * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size

1/31/2017 19

Buffer Chunk Struct

• include/buf0buf.ic: buf_chunk_t

1/31/2017 20

[0]

[…]

[N]

[0] [1] [3][2]

[…] […] [N][…]blocks

frames

Buffer chunk mem

• buf/buf0buf.cc: buf_chunk_init()

Buffer Pool Init

1/31/2017 21

Allocate chunk mem(blocks + frames)

• buf/buf0buf.cc: buf_chunk_init()

Buffer Pool Init

1/31/2017 22

Allocate control blocks

Allocate frames(Page size is aligned)

• buf/buf0buf.cc: buf_chunk_init()

Buffer Pool Init

1/31/2017 23

Add all blocks to free list

Initialize control block

Buffer Control Block (BCB)

• The control block contains

– Read-write lock

– Buffer fix count• Which is incremented when a thread wants a file page to be fixed in a buffer

frame

• The buffer fix operation does not lock the contents of the frame

– Page frame

– File pages• Put to a hash table according to the file address of the page

1/31/2017 24

Buffer Control Block (BCB)

1/31/2017 25

Page

Frame ptr

Mutex

RW lock

Lock hash value

buf_block_t

Table space ID

Page offset

Buffer fix count

IO fix

State

Hash

List (LRU, free, flush)

buf_page_t

• include/buf0buf.h: buf_block_t

BCB Struct

1/31/2017 26

• include/buf0buf.h: buf_block_t

BCB Struct

1/31/2017 27

• include/buf0buf.h: buf_page_t

BCB Struct

1/31/2017 28

Page identification

• include/buf0buf.h: buf_page_t

BCB Struct

1/31/2017 29

...

...

• include/buf0buf.h: buf_page_t

BCB Struct

1/31/2017 30

• buf/buf0buf.cc: buf_block_init()

BCB Init

1/31/2017 31

Set data frame

... Create block mutex & rw lock

BUFFER READPART 1 : READ A PAGE

1/31/2017 32

MySQL Buffer Manager: Read

Database on Flash SSD

DatabaseBuffer

3. Read a page

Tail

1. Search free list

Head D D D

Main LRU List

Free list

Dirty Page Set

D D

Scan LRU List from tail

Double Write Buffer

2. Flush Dirty Pages

D

Buffer Read

• Read a page (buf0rea.cc)

– Find a certain page in the buffer pool using hash table

– If it is not in the buffer pool, then read a block from the storage• Allocate a free block for read (include buffer block)

• Two cases

– Buffer pool has free blocks

– Buffer pool doesn’t have a free block

• Read a page

1/31/2017 34

Buffer Read

1/31/2017 35

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 36

Get the buffer pool ptr using space & offset

** 2 important things **1) ID of a page is (space, offset) of the page2) Buffer pool – page mapping is mapped

Buffer Read

• include/buf0buf.ic: buf_pool_get()

1/31/2017 37

Make a fold number

Buffer Read

• Why fold?

– They want to put pages together in the same buffer pool if it is the same extents for read ahead

1/31/2017 38

Page 0

Page 1

Page 63

Page 64

Page 65

Page 127

Fold 0 Fold 1

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 39

Get page hash lock before searching in the hash table

Set shared lock on hash table

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 40

...

...

Find a page in the hash table

Page doesn’t exist in buffer pool

Read the page from the storage

retry

success

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 41

Fail

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 42

If it failed to read target page, go to the first part of the loop

Buffer Read

• buf/buf0rea.cc: buf_read_page()

1/31/2017 43

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 44

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 45

Allocate buffer block for read

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 46

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 47

Get free block: see this later

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 48

Initialize buffer page for current read

Buffer Read

• buf/buf0buf.cc: buf_page_init()

1/31/2017 49

Buffer Read

• buf/buf0buf.cc: buf_page_init()

1/31/2017 50

Buffer Read

• buf/buf0buf.cc: buf_page_init()

1/31/2017 51

Insert a page into hash table

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 52

Add current block to LRU list: see this later

Set io fix to BUF_IO_READ

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 53

Increase pending read count;How many buffer read were requested but not finished

Buffer Read

• We allocate a free buffer and control block

• And the block was inserted into hash table and LRU list of corresponding buffer pool

• Now, we need to read the real content of the target page from the storage

1/31/2017 54

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 55

Read a page from storage: see this later

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 56

Complete the read request

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 57

Get io type (In this case, BUF_IO_READ)

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 58

Get io type (In this case, BUF_IO_READ)

...

Page corruption check based on checksum in the page

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 59

Set io fix to BUF_IO_NONE

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 60

Decrease pending read count

BUFFER READPART 2 : AFTER GOT BLOCK

1/31/2017 61

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 62

...

Set access time

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 63

Do read ahead process (default=false)

LRU REPLACEMENTPART 1 : ADD BLOCK

1/31/2017 64

LRU Add Block

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 65

Add current block to LRU list

LRU Add Block

• buf/buf0lru.cc: buf_LRU_add_block()

1/31/2017 66

LRU Add Block

• buf/buf0lru.cc: buf_LRU_add_block_low()

1/31/2017 67

If list is too small, then put current block to first of the list

LRU Add Block

• buf/buf0lru.cc: buf_LRU_add_block_low()

1/31/2017 68

Else, insert current block to after LRU_old pointer

LRU REPLACEMENTPART 2 : GET FREE BLOCK

1/31/2017 69

LRU Get Free Block

• This function is called from a user thread when it needs a clean block to read in a page

– Note that we only ever get a block from the free list

– Even when we flush a page or find a page in LRU scan we put it to free list to be used

1/31/2017 70

LRU Get Free Block

• iteration 0:

– get a block from free list, success: done

– if there is an LRU flush batch in progress:• wait for batch to end: retry free list

– if buf_pool->try_LRU_scan is set• scan LRU up to srv_LRU_scan_depth to find a clean block

• the above will put the block on free list

• success: retry the free list

– flush one dirty page from tail of LRU to disk (= single page flush)• the above will put the block on free list

• success: retry the free list

1/31/2017 71

LRU Get Free Block

• iteration 1:

– same as iteration 0 except:• scan whole LRU list

• iteration > 1:

– same as iteration 1 but sleep 100ms

1/31/2017 72

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 73

Get buffer pool mutex

Get a free block from the free list

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 74

Getting a free block succeeded

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 75

If already background flushed started, wait for it to end

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 76

Find a victim page to replace and make a free block

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_scan_and_free_block()

1/31/2017 77

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_scan_and_free_block()

1/31/2017 78

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_free_from_common_LRU_list()

1/31/2017 79

Try to free it

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_free_page()

1/31/2017 80

Get hash lock & block mutex

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_free_page()

1/31/2017 81

If current page is dirty and not flushed to disk yet, exit

• buf/buf0lru.cc: buf_LRU_free_page()– After func_exit, we are on clean case!

LRU Get Free Block

1/31/2017 82

...

...

...

• buf/buf0lru.cc: buf_LRU_block_remove_hashed()

LRU Get Free Block

1/31/2017 83

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 84

...

If we have free block(s), go to loop

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 85

If we failed to make a free block, do a single page flush

FLUSH A PAGEPART 1 : SINGLE PAGE FLUSH

1/31/2017 86

Flush a Page

• Flushing a page by

– A background flusher batch flush (LRU & flush list)

– A single page flush in LRU_get_free_block()

• Background flusher

– Regularly check system status (per 1000ms)

– Flush all buffer pool instances in a batch manner

1/31/2017 87

Single Page Flush

1/31/2017 88

buf_LRU_get_free_block()

buf_flush_single_page_from_LRU()

buf_flush_ready_for_flush() buf_flush

_page()buf_flush_write_block_low()

buf_dblwr_write_single_page()

fil_io()

fil_flush()

buf_page_io_complete()

• buf/buf0flu.cc: buf_flush_single_page_from_LRU()

Single Page Flush

1/31/2017 89

Full scan

• buf/buf0flu.cc: buf_flush_single_page_from_LRU()

Single Page Flush

1/31/2017 90

Check whether we can flush current block and ready for flush

Try to flush it; write to disk

...

• buf/buf0flu.cc: buf_flush_ready_for_flush()

Single Page Flush

1/31/2017 91

If the page is already flushed or doing IO, return false

...

• buf/buf0flu.cc: buf_flush_page()

– Writes a flushable page from the buffer pool to a file

Single Page Flush

1/31/2017 92

Get lock

...

• buf/buf0flu.cc: buf_flush_page()

Single Page Flush

1/31/2017 93

Set fix and flush type

...

...

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 94

Flush log (transaction log – WAL)

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 95

Doublewrite off case

Write the page to dwb, then write to datafile;See this in dwb part

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 96

Sync buffered write to disk; call fsync by fil_flush()

FLUSH A PAGEPART 2 : BATCH FLUSH

1/31/2017 97

Batch Flush

• Background flusher (= page cleaner thread)

– Independent thread for flushing a dirty pages from buffer pools to storage

– Regularly (per 1000ms) do flush from LRU tail or

– Do flush by dirty page percent (configurable)

• Thread definition

– buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)

1/31/2017 98

...

• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)

Background Flusher

1/31/2017 99

Run until shutdown

• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)

Background Flusher

1/31/2017 100

Nothing hasbeen changed

Something hasbeen changed!

• buf/buf0flu.cc: buf_flush_LRU_tail()

• Clears up tail of the LRU lists:

– Put replaceable pages at the tail of LRU to the free list

– Flush dirty pages at the tail of LRU to the disk

• srv_LRU_scan_depth

– Scan each buffer pool at this amount

– Configuable: innodb_LRU_scan_depth

LRU List Batch Flush

1/31/2017 101

LRU List Batch Flush

1/31/2017 102

buf_flush_LRU_tail()

buf_flush_LRU()

buf_flush_batch()

buf_do_LRU_batch()

buf_flush_LRU_list_batch()

buf_LRU_free_page()

fil_io()

fil_flush()

buf_page_io_complete()

buf_flush_page_and_try_neighbors()

buf_flush_page()

buf_flush_write_block_low()

buf_dblwr_add_to_batch()

buf_dblwr_flush_

buffered_writes()

• buf/buf0flu.cc: buf_flush_LRU_tail()

LRU List Batch Flush

1/31/2017 103

Per buffer pool instance

• buf/buf0flu.cc: buf_flush_LRU_tail()

LRU List Batch Flush

1/31/2017 104

Chunk size = 100

• buf/buf0flu.cc: buf_flush_LRU()

LRU List Batch Flush

1/31/2017 105

...

Batch LRU flush

• buf/buf0flu.cc: buf_flush_batch()

LRU List Batch Flush

1/31/2017 106

Do LRU batch

• buf/buf0flu.cc: buf_do_LRU_batch()

LRU List Batch Flush

1/31/2017 107

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 108

Get the last page from LRU

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 109

• buf/buf0flu.cc: buf_flush_ready_for_replace()

LRU List Batch Flush

1/31/2017 110

Check whether current page is clean page or not

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 111

It there is any replaceable page, free the page

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 112

Else, try to flush neighbor pages

• buf/buf0flu.cc: buf_flush_page_and_try_neighbors()

LRU List Batch Flush

1/31/2017 113

Flush page, but no sync

• buf/buf0flu.cc: buf_flush_write_block_low()

LRU List Batch Flush

1/31/2017 114

Doublewrite off case

Add the page to the dwb buffer;See this in dwb part

LRU List Batch Flush

• Now, victim pages are gathered for replacement

• We need to flush them to disk

• We can do this by calling buf_flush_common()

1/31/2017 115

• buf/buf0flu.cc: buf_flush_LRU()

LRU List Batch Flush

1/31/2017 116

...

• buf/buf0flu.cc: buf_flush_common()

LRU List Batch Flush

1/31/2017 117

• flush all pages we gathered so far• write the pages to dwb area first• then issue it to datafile; See this later

DOUBLEWRITE BUFFERPART 1 : ARCHITECTURE

1/31/2017 118

Double Write Buffer

• To avoid torn page (partial page) written problem

• Write dirty pages to special storage area in system tablespace priori to write database file

1/31/2017 119

Database on Flash SSD

DatabaseBuffer

TailHead D D D

Main LRU List

Free list

Dirty Page Set

D D

Scan LRU List from tail

D

Double Write Buffer

ibdata1-space0datafiles

DWB Architecture

1/31/2017 120

• include/buf0dblwr.cc: buf_dblwr_t

DWB Struct

1/31/2017 121

• include/buf0dblwr.cc: buf_dblwr_t

DWB Struct

1/31/2017 122

DOUBLEWRITE BUFFERPART 2 : SINGLE PAGE FLUSH

1/31/2017 123

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 124

...

• buf/buf0dblwr.cc: buf_dblwr_write_single_page()

Single Page Flush

1/31/2017 125

# of slots for single page flush= 2 * DOUBLEWRITE_BLOCK_SIZE – BATCH_SIZE= 128 – 120= 8

• buf/buf0dblwr.cc: buf_dblwr_write_single_page()

Single Page Flush

1/31/2017 126

If all slots are reserved, wait until current

dblwr done

Find a free slot

• buf/buf0dblwr.cc: buf_dblwr_write_single_page()

Single Page Flush

1/31/2017 127

Write to datafile(synchronous)

Sync system tablespace for dwb area in disk

Write block to dwb area in disk(synchronous)

...

• buf/buf0dblwr.cc: buf_dblwr_write_block_to_datafile()

Single Page Flush

1/31/2017 128

Issue write operation to datafile

In single page flush, sync = true

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 129

Sync buffered write to disk; call fsync by fil_flush()

Single Page Flush

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 130

Get io type (In this case, BUF_IO_WRITE)

Single Page Flush

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 131

Set io fix to BUF_IO_NONE

Single Page Flush

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 132

Single Page Flush

• buf/buf0flu.cc: buf_flush_write_complete()

1/31/2017 133

Remove the block from the flush list

Single Page Flush

• buf/buf0dblwr.cc: buf_dblwr_update()

1/31/2017 134

Free the dwb slot of the target page

Single Page Flush

• Single page flush is performed in the context of the query thread itself

• Single page flush mode iterates over the LRU list of a buffer pool instance, while holding the buffer pool mutex

• It might have trouble in getting a free doublewrite buffer slot (total 8 slots)

• In result, it makes the overall performance worse

1/31/2017 135

DOUBLEWRITE BUFFERPART 3 : BATCH FLUSH

1/31/2017 136

• buf/buf0flu.cc: buf_flush_write_block_low()

Batch Flush

1/31/2017 137

• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()

Batch Flush

1/31/2017 138

If another batch is already running, wait until done

• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()

Batch Flush

1/31/2017 139

If all slots for batch flush in dwb buffer is reserved,

flush dwb buffer

• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()

Batch Flush

1/31/2017 140

After flushing, copy current block to buf_dblwr->first_free

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 141

Doublewrite off case

• buf/buf0dblwr.cc: buf_dblwr_sync_datafiles()

Batch Flush

1/31/2017 142

fil_flush() os_file_flush() os_file_fsync() fsync()

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 143

Change batch running status

Exit mutex

Nobody won’t be here except me!

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 144

Issue write op for block1(synchronous)

If current write uses only block1, then flush

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 145

Issue write op for block2(synchronous)

...

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 146

Flush (fsync) system table space

Write all blocks to datafile(asynchronous)

• After submitting aio requests,

– fil_aio_wait()

– buf_page_io_complete()

– buf_flush_write_complete()

– buf_dblwr_update()

Batch Flush

1/31/2017 147

• buf/buf0dblwr.cc: buf_dblwr_update()

Batch Flush

1/31/2017 148

Flush datafile

Reset dwb

SYNCHRONIZATION

1/31/2017 149

InnoDB Synchronization

• InnoDB implements its own mutexes & RW-locks for buffer management

• Latch in InnoDB

– A lightweight structure used by InnoDB to implement a lock

– Typically held for a brief time (milliseconds or microseconds)

– A general term that includes both mutexes (for exclusive access) and rw-locks (for shared access)

1/31/2017 150

Mutex in InnoDB

• The low-level object to represent and enforce exclusive-access locks to internal in-memory data structures

• Once the lock is acquired, any other process, thread, and so on is prevented from acquiring the same lock

1/31/2017 151

Mutex in InnoDB

1/31/2017 152

• Example code in InnoDB

– buf/buf0buf.cc: buf_wait_for_read()

Get current IO fix of the block

Mutex in InnoDB

1/31/2017 153

• Example code in InnoDB

– buf/buf0flu.cc: buf_flush_batch()

Flush the pages in the buffer pool

RW-lock in InnoDB

• The low-level object to represent and enforce shared-access locks to internal in-memory data structures

• RW-lock includes three types of locks

– S-locks (shared locks)

– X-locks (exclusive locks)

– SX-locks (shared-exclusive locks)

1/31/2017 154

S SX X

S Compatible Compatible Conflict

SX Compatible Conflict Conflict

X Conflict Conflict Conflict

RW-lock in InnoDB

• S-lock (Shared-lock) – provides read access to a common resource

• X-lock (eXclusive-lock)– provides write access to a common resource

– while not permitting inconsistent reads by other threads

• SX-lock (Shared-eXclusive lock)– provides write access to a common resource

– while permitting inconsistent reads by other threads

– introduced in MySQL 5.7 to optimize concurrency and improve scalability for read-write workloads.

1/31/2017 155

RW-lock in InnoDB

1/31/2017 156

• Example code in InnoDB (S-lock)

– buf/buf0buf.cc: buf_page_get_gen()

Search hash table

RW-lock in InnoDB

1/31/2017 157

• Example code in InnoDB (X-lock)

– buf/buf0buf.cc: buf_page_init_for_read()

Insert a page into the hash table