MySQL Buffer Management

157
1/31/2017 1 MySQL Buffer Management Mijin Ahn [email protected]

Transcript of MySQL Buffer Management

Page 1: MySQL Buffer Management

1/31/2017 1

MySQL Buffer Management

Mijin [email protected]

Page 2: MySQL Buffer Management

Contents

1/31/2017 2

• Overview

• Buffer Pool

• Buffer Read

• LRU Replacement

• Flusher

• Doublewrite Buffer

• Synchronization

Page 3: MySQL Buffer Management

OVERVIEW

1/31/2017 3

Page 4: MySQL Buffer Management

Overview

1/31/2017 4

• Buffer Pool

– Considering memory hierarchy

– Caching frequently accessed data into DRAM like a cache memory in CPU

– Exploit locality

Page 5: MySQL Buffer Management

Overview

• InnoDB Architecture

1/31/2017 5

Handler API

Transaction (trx)

Logging & Crash

Recovery(log & recv)

Mini Transaction

(mtr)

Lock(lck)

Cursor (cur)

Row (row)

B-tree (btr)

Page (page)

Buffer Manager (buf)

Free space / File Management (fsp / fil)

IO

Page 6: MySQL Buffer Management

Overview

• Buffer Manager– Buffer Pool (buf0buf.cc) : buffer pool manager

– Buffer Read (buf0read.cc) : read buffer

– LRU (buf0lru.cc) : buffer replacement

1/31/2017 6

Page 7: MySQL Buffer Management

Overview

• Buffer Manager– Flusher (buf0flu.cc) : dirty page writer & background flusher

– Doublewrite (buf0dblwr.cc) : doublewrite buffer

1/31/2017 7

Page 8: MySQL Buffer Management

BUFFER POOL

1/31/2017 8

Page 9: MySQL Buffer Management

Lists of Buffer Blocks

• Free list

– Contains free page frames

• LRU list

– Contains all the blocks holding a file page

• Flush list

– Contains the blocks holding file pages that have been modified in the memory but not written to disk yet

1/31/2017 9

DatabaseBuffer

TailHead D D D

Main LRU list

Free list

D

Flush list D D

Page 10: MySQL Buffer Management

Buffer Pool Mutex

• The buffer buf_pool contains a single mutex

– buf_pool->mutex: protects all the control data structures of the buf_pool

• The buf_pool->mutex is a hot-spot in main memory

– Causing a lot of memory bus traffic on multiprocessor systems when processors alternatively access the mutex

– A solution to reduce mutex contention• To create a separate lock for the page hash table

1/31/2017 10

Page 11: MySQL Buffer Management

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 11

Page 12: MySQL Buffer Management

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 12

Page 13: MySQL Buffer Management

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 13

Page 14: MySQL Buffer Management

Buffer Pool Struct

• include/buf0buf.h: buf_pool_t

1/31/2017 14

Page 15: MySQL Buffer Management

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init()

1/31/2017 15

...

Buffer pool init per instance

Page 16: MySQL Buffer Management

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init_instance()

1/31/2017 16

Create mutex for buffer pool

Page 17: MySQL Buffer Management

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init_instance()

1/31/2017 17

Initialize buffer chunk

Page 18: MySQL Buffer Management

Buffer Pool Init

• buf/buf0buf.cc: buf_pool_init_instance()

1/31/2017 18

Create page hash table

Page 19: MySQL Buffer Management

Buffer Chunk

• Total buffer pool size

= x * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size

1/31/2017 19

Page 20: MySQL Buffer Management

Buffer Chunk Struct

• include/buf0buf.ic: buf_chunk_t

1/31/2017 20

[0]

[…]

[N]

[0] [1] [3][2]

[…] […] [N][…]blocks

frames

Buffer chunk mem

Page 21: MySQL Buffer Management

• buf/buf0buf.cc: buf_chunk_init()

Buffer Pool Init

1/31/2017 21

Allocate chunk mem(blocks + frames)

Page 22: MySQL Buffer Management

• buf/buf0buf.cc: buf_chunk_init()

Buffer Pool Init

1/31/2017 22

Allocate control blocks

Allocate frames(Page size is aligned)

Page 23: MySQL Buffer Management

• buf/buf0buf.cc: buf_chunk_init()

Buffer Pool Init

1/31/2017 23

Add all blocks to free list

Initialize control block

Page 24: MySQL Buffer Management

Buffer Control Block (BCB)

• The control block contains

– Read-write lock

– Buffer fix count• Which is incremented when a thread wants a file page to be fixed in a buffer

frame

• The buffer fix operation does not lock the contents of the frame

– Page frame

– File pages• Put to a hash table according to the file address of the page

1/31/2017 24

Page 25: MySQL Buffer Management

Buffer Control Block (BCB)

1/31/2017 25

Page

Frame ptr

Mutex

RW lock

Lock hash value

buf_block_t

Table space ID

Page offset

Buffer fix count

IO fix

State

Hash

List (LRU, free, flush)

buf_page_t

Page 26: MySQL Buffer Management

• include/buf0buf.h: buf_block_t

BCB Struct

1/31/2017 26

Page 27: MySQL Buffer Management

• include/buf0buf.h: buf_block_t

BCB Struct

1/31/2017 27

Page 28: MySQL Buffer Management

• include/buf0buf.h: buf_page_t

BCB Struct

1/31/2017 28

Page identification

Page 29: MySQL Buffer Management

• include/buf0buf.h: buf_page_t

BCB Struct

1/31/2017 29

...

...

Page 30: MySQL Buffer Management

• include/buf0buf.h: buf_page_t

BCB Struct

1/31/2017 30

Page 31: MySQL Buffer Management

• buf/buf0buf.cc: buf_block_init()

BCB Init

1/31/2017 31

Set data frame

... Create block mutex & rw lock

Page 32: MySQL Buffer Management

BUFFER READPART 1 : READ A PAGE

1/31/2017 32

Page 33: MySQL Buffer Management

MySQL Buffer Manager: Read

Database on Flash SSD

DatabaseBuffer

3. Read a page

Tail

1. Search free list

Head D D D

Main LRU List

Free list

Dirty Page Set

D D

Scan LRU List from tail

Double Write Buffer

2. Flush Dirty Pages

D

Page 34: MySQL Buffer Management

Buffer Read

• Read a page (buf0rea.cc)

– Find a certain page in the buffer pool using hash table

– If it is not in the buffer pool, then read a block from the storage• Allocate a free block for read (include buffer block)

• Two cases

– Buffer pool has free blocks

– Buffer pool doesn’t have a free block

• Read a page

1/31/2017 34

Page 35: MySQL Buffer Management

Buffer Read

1/31/2017 35

Page 36: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 36

Get the buffer pool ptr using space & offset

** 2 important things **1) ID of a page is (space, offset) of the page2) Buffer pool – page mapping is mapped

Page 37: MySQL Buffer Management

Buffer Read

• include/buf0buf.ic: buf_pool_get()

1/31/2017 37

Make a fold number

Page 38: MySQL Buffer Management

Buffer Read

• Why fold?

– They want to put pages together in the same buffer pool if it is the same extents for read ahead

1/31/2017 38

Page 0

Page 1

Page 63

Page 64

Page 65

Page 127

Fold 0 Fold 1

Page 39: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 39

Get page hash lock before searching in the hash table

Set shared lock on hash table

Page 40: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 40

...

...

Find a page in the hash table

Page doesn’t exist in buffer pool

Read the page from the storage

retry

success

Page 41: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 41

Fail

Page 42: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 42

If it failed to read target page, go to the first part of the loop

Page 43: MySQL Buffer Management

Buffer Read

• buf/buf0rea.cc: buf_read_page()

1/31/2017 43

Page 44: MySQL Buffer Management

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 44

Page 45: MySQL Buffer Management

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 45

Allocate buffer block for read

Page 46: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 46

Page 47: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 47

Get free block: see this later

Page 48: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 48

Initialize buffer page for current read

Page 49: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init()

1/31/2017 49

Page 50: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init()

1/31/2017 50

Page 51: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init()

1/31/2017 51

Insert a page into hash table

Page 52: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 52

Add current block to LRU list: see this later

Set io fix to BUF_IO_READ

Page 53: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 53

Increase pending read count;How many buffer read were requested but not finished

Page 54: MySQL Buffer Management

Buffer Read

• We allocate a free buffer and control block

• And the block was inserted into hash table and LRU list of corresponding buffer pool

• Now, we need to read the real content of the target page from the storage

1/31/2017 54

Page 55: MySQL Buffer Management

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 55

Read a page from storage: see this later

Page 56: MySQL Buffer Management

Buffer Read

• buf/buf0rea.cc: buf_read_page_low()

1/31/2017 56

Complete the read request

Page 57: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 57

Get io type (In this case, BUF_IO_READ)

Page 58: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 58

Get io type (In this case, BUF_IO_READ)

...

Page corruption check based on checksum in the page

Page 59: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 59

Set io fix to BUF_IO_NONE

Page 60: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 60

Decrease pending read count

Page 61: MySQL Buffer Management

BUFFER READPART 2 : AFTER GOT BLOCK

1/31/2017 61

Page 62: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 62

...

Set access time

Page 63: MySQL Buffer Management

Buffer Read

• buf/buf0buf.cc: buf_page_get_gen()

1/31/2017 63

Do read ahead process (default=false)

Page 64: MySQL Buffer Management

LRU REPLACEMENTPART 1 : ADD BLOCK

1/31/2017 64

Page 65: MySQL Buffer Management

LRU Add Block

• buf/buf0buf.cc: buf_page_init_for_read()

1/31/2017 65

Add current block to LRU list

Page 66: MySQL Buffer Management

LRU Add Block

• buf/buf0lru.cc: buf_LRU_add_block()

1/31/2017 66

Page 67: MySQL Buffer Management

LRU Add Block

• buf/buf0lru.cc: buf_LRU_add_block_low()

1/31/2017 67

If list is too small, then put current block to first of the list

Page 68: MySQL Buffer Management

LRU Add Block

• buf/buf0lru.cc: buf_LRU_add_block_low()

1/31/2017 68

Else, insert current block to after LRU_old pointer

Page 69: MySQL Buffer Management

LRU REPLACEMENTPART 2 : GET FREE BLOCK

1/31/2017 69

Page 70: MySQL Buffer Management

LRU Get Free Block

• This function is called from a user thread when it needs a clean block to read in a page

– Note that we only ever get a block from the free list

– Even when we flush a page or find a page in LRU scan we put it to free list to be used

1/31/2017 70

Page 71: MySQL Buffer Management

LRU Get Free Block

• iteration 0:

– get a block from free list, success: done

– if there is an LRU flush batch in progress:• wait for batch to end: retry free list

– if buf_pool->try_LRU_scan is set• scan LRU up to srv_LRU_scan_depth to find a clean block

• the above will put the block on free list

• success: retry the free list

– flush one dirty page from tail of LRU to disk (= single page flush)• the above will put the block on free list

• success: retry the free list

1/31/2017 71

Page 72: MySQL Buffer Management

LRU Get Free Block

• iteration 1:

– same as iteration 0 except:• scan whole LRU list

• iteration > 1:

– same as iteration 1 but sleep 100ms

1/31/2017 72

Page 73: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 73

Get buffer pool mutex

Get a free block from the free list

Page 74: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 74

Getting a free block succeeded

Page 75: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 75

If already background flushed started, wait for it to end

Page 76: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 76

Find a victim page to replace and make a free block

Page 77: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_scan_and_free_block()

1/31/2017 77

Page 78: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_scan_and_free_block()

1/31/2017 78

Page 79: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_free_from_common_LRU_list()

1/31/2017 79

Try to free it

Page 80: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_free_page()

1/31/2017 80

Get hash lock & block mutex

Page 81: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_free_page()

1/31/2017 81

If current page is dirty and not flushed to disk yet, exit

Page 82: MySQL Buffer Management

• buf/buf0lru.cc: buf_LRU_free_page()– After func_exit, we are on clean case!

LRU Get Free Block

1/31/2017 82

Page 83: MySQL Buffer Management

...

...

...

• buf/buf0lru.cc: buf_LRU_block_remove_hashed()

LRU Get Free Block

1/31/2017 83

Page 84: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 84

...

If we have free block(s), go to loop

Page 85: MySQL Buffer Management

LRU Get Free Block

• buf/buf0lru.cc: buf_LRU_get_free_block()

1/31/2017 85

If we failed to make a free block, do a single page flush

Page 86: MySQL Buffer Management

FLUSH A PAGEPART 1 : SINGLE PAGE FLUSH

1/31/2017 86

Page 87: MySQL Buffer Management

Flush a Page

• Flushing a page by

– A background flusher batch flush (LRU & flush list)

– A single page flush in LRU_get_free_block()

• Background flusher

– Regularly check system status (per 1000ms)

– Flush all buffer pool instances in a batch manner

1/31/2017 87

Page 88: MySQL Buffer Management

Single Page Flush

1/31/2017 88

buf_LRU_get_free_block()

buf_flush_single_page_from_LRU()

buf_flush_ready_for_flush() buf_flush

_page()buf_flush_write_block_low()

buf_dblwr_write_single_page()

fil_io()

fil_flush()

buf_page_io_complete()

Page 89: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_single_page_from_LRU()

Single Page Flush

1/31/2017 89

Full scan

Page 90: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_single_page_from_LRU()

Single Page Flush

1/31/2017 90

Check whether we can flush current block and ready for flush

Try to flush it; write to disk

Page 91: MySQL Buffer Management

...

• buf/buf0flu.cc: buf_flush_ready_for_flush()

Single Page Flush

1/31/2017 91

If the page is already flushed or doing IO, return false

Page 92: MySQL Buffer Management

...

• buf/buf0flu.cc: buf_flush_page()

– Writes a flushable page from the buffer pool to a file

Single Page Flush

1/31/2017 92

Get lock

Page 93: MySQL Buffer Management

...

• buf/buf0flu.cc: buf_flush_page()

Single Page Flush

1/31/2017 93

Set fix and flush type

Page 94: MySQL Buffer Management

...

...

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 94

Flush log (transaction log – WAL)

Page 95: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 95

Doublewrite off case

Write the page to dwb, then write to datafile;See this in dwb part

Page 96: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 96

Sync buffered write to disk; call fsync by fil_flush()

Page 97: MySQL Buffer Management

FLUSH A PAGEPART 2 : BATCH FLUSH

1/31/2017 97

Page 98: MySQL Buffer Management

Batch Flush

• Background flusher (= page cleaner thread)

– Independent thread for flushing a dirty pages from buffer pools to storage

– Regularly (per 1000ms) do flush from LRU tail or

– Do flush by dirty page percent (configurable)

• Thread definition

– buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)

1/31/2017 98

Page 99: MySQL Buffer Management

...

• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)

Background Flusher

1/31/2017 99

Run until shutdown

Page 100: MySQL Buffer Management

• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)

Background Flusher

1/31/2017 100

Nothing hasbeen changed

Something hasbeen changed!

Page 101: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU_tail()

• Clears up tail of the LRU lists:

– Put replaceable pages at the tail of LRU to the free list

– Flush dirty pages at the tail of LRU to the disk

• srv_LRU_scan_depth

– Scan each buffer pool at this amount

– Configuable: innodb_LRU_scan_depth

LRU List Batch Flush

1/31/2017 101

Page 102: MySQL Buffer Management

LRU List Batch Flush

1/31/2017 102

buf_flush_LRU_tail()

buf_flush_LRU()

buf_flush_batch()

buf_do_LRU_batch()

buf_flush_LRU_list_batch()

buf_LRU_free_page()

fil_io()

fil_flush()

buf_page_io_complete()

buf_flush_page_and_try_neighbors()

buf_flush_page()

buf_flush_write_block_low()

buf_dblwr_add_to_batch()

buf_dblwr_flush_

buffered_writes()

Page 103: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU_tail()

LRU List Batch Flush

1/31/2017 103

Per buffer pool instance

Page 104: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU_tail()

LRU List Batch Flush

1/31/2017 104

Chunk size = 100

Page 105: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU()

LRU List Batch Flush

1/31/2017 105

...

Batch LRU flush

Page 106: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_batch()

LRU List Batch Flush

1/31/2017 106

Do LRU batch

Page 107: MySQL Buffer Management

• buf/buf0flu.cc: buf_do_LRU_batch()

LRU List Batch Flush

1/31/2017 107

Page 108: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 108

Get the last page from LRU

Page 109: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 109

Page 110: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_ready_for_replace()

LRU List Batch Flush

1/31/2017 110

Check whether current page is clean page or not

Page 111: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 111

It there is any replaceable page, free the page

Page 112: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU_list_batch()

LRU List Batch Flush

1/31/2017 112

Else, try to flush neighbor pages

Page 113: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_page_and_try_neighbors()

LRU List Batch Flush

1/31/2017 113

Flush page, but no sync

Page 114: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_write_block_low()

LRU List Batch Flush

1/31/2017 114

Doublewrite off case

Add the page to the dwb buffer;See this in dwb part

Page 115: MySQL Buffer Management

LRU List Batch Flush

• Now, victim pages are gathered for replacement

• We need to flush them to disk

• We can do this by calling buf_flush_common()

1/31/2017 115

Page 116: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_LRU()

LRU List Batch Flush

1/31/2017 116

...

Page 117: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_common()

LRU List Batch Flush

1/31/2017 117

• flush all pages we gathered so far• write the pages to dwb area first• then issue it to datafile; See this later

Page 118: MySQL Buffer Management

DOUBLEWRITE BUFFERPART 1 : ARCHITECTURE

1/31/2017 118

Page 119: MySQL Buffer Management

Double Write Buffer

• To avoid torn page (partial page) written problem

• Write dirty pages to special storage area in system tablespace priori to write database file

1/31/2017 119

Database on Flash SSD

DatabaseBuffer

TailHead D D D

Main LRU List

Free list

Dirty Page Set

D D

Scan LRU List from tail

D

Double Write Buffer

ibdata1-space0datafiles

Page 120: MySQL Buffer Management

DWB Architecture

1/31/2017 120

Page 121: MySQL Buffer Management

• include/buf0dblwr.cc: buf_dblwr_t

DWB Struct

1/31/2017 121

Page 122: MySQL Buffer Management

• include/buf0dblwr.cc: buf_dblwr_t

DWB Struct

1/31/2017 122

Page 123: MySQL Buffer Management

DOUBLEWRITE BUFFERPART 2 : SINGLE PAGE FLUSH

1/31/2017 123

Page 124: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 124

Page 125: MySQL Buffer Management

...

• buf/buf0dblwr.cc: buf_dblwr_write_single_page()

Single Page Flush

1/31/2017 125

# of slots for single page flush= 2 * DOUBLEWRITE_BLOCK_SIZE – BATCH_SIZE= 128 – 120= 8

Page 126: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_write_single_page()

Single Page Flush

1/31/2017 126

If all slots are reserved, wait until current

dblwr done

Find a free slot

Page 127: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_write_single_page()

Single Page Flush

1/31/2017 127

Write to datafile(synchronous)

Sync system tablespace for dwb area in disk

Write block to dwb area in disk(synchronous)

Page 128: MySQL Buffer Management

...

• buf/buf0dblwr.cc: buf_dblwr_write_block_to_datafile()

Single Page Flush

1/31/2017 128

Issue write operation to datafile

In single page flush, sync = true

Page 129: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_write_block_low()

Single Page Flush

1/31/2017 129

Sync buffered write to disk; call fsync by fil_flush()

Page 130: MySQL Buffer Management

Single Page Flush

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 130

Get io type (In this case, BUF_IO_WRITE)

Page 131: MySQL Buffer Management

Single Page Flush

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 131

Set io fix to BUF_IO_NONE

Page 132: MySQL Buffer Management

Single Page Flush

• buf/buf0buf.cc: buf_page_io_complete()

1/31/2017 132

Page 133: MySQL Buffer Management

Single Page Flush

• buf/buf0flu.cc: buf_flush_write_complete()

1/31/2017 133

Remove the block from the flush list

Page 134: MySQL Buffer Management

Single Page Flush

• buf/buf0dblwr.cc: buf_dblwr_update()

1/31/2017 134

Free the dwb slot of the target page

Page 135: MySQL Buffer Management

Single Page Flush

• Single page flush is performed in the context of the query thread itself

• Single page flush mode iterates over the LRU list of a buffer pool instance, while holding the buffer pool mutex

• It might have trouble in getting a free doublewrite buffer slot (total 8 slots)

• In result, it makes the overall performance worse

1/31/2017 135

Page 136: MySQL Buffer Management

DOUBLEWRITE BUFFERPART 3 : BATCH FLUSH

1/31/2017 136

Page 137: MySQL Buffer Management

• buf/buf0flu.cc: buf_flush_write_block_low()

Batch Flush

1/31/2017 137

Page 138: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()

Batch Flush

1/31/2017 138

If another batch is already running, wait until done

Page 139: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()

Batch Flush

1/31/2017 139

If all slots for batch flush in dwb buffer is reserved,

flush dwb buffer

Page 140: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()

Batch Flush

1/31/2017 140

After flushing, copy current block to buf_dblwr->first_free

Page 141: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 141

Doublewrite off case

Page 142: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_sync_datafiles()

Batch Flush

1/31/2017 142

fil_flush() os_file_flush() os_file_fsync() fsync()

Page 143: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 143

Change batch running status

Exit mutex

Nobody won’t be here except me!

Page 144: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 144

Issue write op for block1(synchronous)

If current write uses only block1, then flush

Page 145: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 145

Issue write op for block2(synchronous)

Page 146: MySQL Buffer Management

...

• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()

Batch Flush

1/31/2017 146

Flush (fsync) system table space

Write all blocks to datafile(asynchronous)

Page 147: MySQL Buffer Management

• After submitting aio requests,

– fil_aio_wait()

– buf_page_io_complete()

– buf_flush_write_complete()

– buf_dblwr_update()

Batch Flush

1/31/2017 147

Page 148: MySQL Buffer Management

• buf/buf0dblwr.cc: buf_dblwr_update()

Batch Flush

1/31/2017 148

Flush datafile

Reset dwb

Page 149: MySQL Buffer Management

SYNCHRONIZATION

1/31/2017 149

Page 150: MySQL Buffer Management

InnoDB Synchronization

• InnoDB implements its own mutexes & RW-locks for buffer management

• Latch in InnoDB

– A lightweight structure used by InnoDB to implement a lock

– Typically held for a brief time (milliseconds or microseconds)

– A general term that includes both mutexes (for exclusive access) and rw-locks (for shared access)

1/31/2017 150

Page 151: MySQL Buffer Management

Mutex in InnoDB

• The low-level object to represent and enforce exclusive-access locks to internal in-memory data structures

• Once the lock is acquired, any other process, thread, and so on is prevented from acquiring the same lock

1/31/2017 151

Page 152: MySQL Buffer Management

Mutex in InnoDB

1/31/2017 152

• Example code in InnoDB

– buf/buf0buf.cc: buf_wait_for_read()

Get current IO fix of the block

Page 153: MySQL Buffer Management

Mutex in InnoDB

1/31/2017 153

• Example code in InnoDB

– buf/buf0flu.cc: buf_flush_batch()

Flush the pages in the buffer pool

Page 154: MySQL Buffer Management

RW-lock in InnoDB

• The low-level object to represent and enforce shared-access locks to internal in-memory data structures

• RW-lock includes three types of locks

– S-locks (shared locks)

– X-locks (exclusive locks)

– SX-locks (shared-exclusive locks)

1/31/2017 154

S SX X

S Compatible Compatible Conflict

SX Compatible Conflict Conflict

X Conflict Conflict Conflict

Page 155: MySQL Buffer Management

RW-lock in InnoDB

• S-lock (Shared-lock) – provides read access to a common resource

• X-lock (eXclusive-lock)– provides write access to a common resource

– while not permitting inconsistent reads by other threads

• SX-lock (Shared-eXclusive lock)– provides write access to a common resource

– while permitting inconsistent reads by other threads

– introduced in MySQL 5.7 to optimize concurrency and improve scalability for read-write workloads.

1/31/2017 155

Page 156: MySQL Buffer Management

RW-lock in InnoDB

1/31/2017 156

• Example code in InnoDB (S-lock)

– buf/buf0buf.cc: buf_page_get_gen()

Search hash table

Page 157: MySQL Buffer Management

RW-lock in InnoDB

1/31/2017 157

• Example code in InnoDB (X-lock)

– buf/buf0buf.cc: buf_page_init_for_read()

Insert a page into the hash table