An Efficient Backup and Replication of Storage

54
An Efficient Backup and Replication of Storage Takashi HOSHINO Cybozu Labs, Inc. 2013-05-29@LinuxConJapan rev.20130529a

description

Talk slides for LinuxCon Japan 2013.

Transcript of An Efficient Backup and Replication of Storage

Page 1: An Efficient Backup and Replication of Storage

An Efficient Backup and Replication of Storage

Takashi HOSHINO

Cybozu Labs, Inc.

2013-05-29@LinuxConJapan

rev.20130529a

Page 2: An Efficient Backup and Replication of Storage

Self-introduction

• Takashi HOSHINO

– Works at Cybozu Labs, Inc.

• Technical interests

– Database, storage, distributed algorithms

• Current work

– WalB: Today’s talk!

2

Page 3: An Efficient Backup and Replication of Storage

Contents

• Motivation

• Architecture

• Alternative solutions

• Algorithm

• Performance evaluation

• Summary

3

Page 4: An Efficient Backup and Replication of Storage

Contents

• Motivation

• Architecture

• Alternative solutions

• Algorithm

• Performance evaluation

• Summary

4

Page 5: An Efficient Backup and Replication of Storage

Motivation

• Backup and replication are vital

– We have our own cloud infrastructure

– Requiring high availability of customers’ data

– With cost-effective commodity hardware and software

5

Operating data

Backup data

Replicated data

Primary DC Secondary DC

Page 6: An Efficient Backup and Replication of Storage

Requirements

• Functionality

– Getting consistent diff data for backup/replication achieving small RPO

• Performance

– Without usual full-scans

– Small overhead

• Various kinds of data support

– Databases

– Blob data

– Full-text-search indexes

– ... 6

Page 7: An Efficient Backup and Replication of Storage

• A Linux kernel device driver to provide wrapper block devices and related userland tools

• Provides consistent diff extraction functionality for efficient backup and replication

• “WalB” means “Block-level WAL”

A solution: WalB

7

WalB storage

Backup storage

Replicated storage

Diffs

Page 8: An Efficient Backup and Replication of Storage

WAL (Write-Ahead Logging)

8

Ordinary Storage WAL Storage

Write at 0

Write at 2

Read at 2

Write at 2

0

0 2

0 2

0 2 2

Time

Page 9: An Efficient Backup and Replication of Storage

How to get diffs

• Full-scan and compare

• Partial-scan with bitmaps (or indexes)

• Scan logs directly from WAL storages

9

a b c d e a b’ c d e’ b’ e’ 1 4

a b c d e a b’ c d e’ b’ e’ 1 4

00000 01001

a b c d e a b’ c d e’ b’ e’ 1 4

b’ e’ 1 4

Data at t0 Data at t1 Diffs from t0 to t1

Page 10: An Efficient Backup and Replication of Storage

Contents

• Motivation

• Architecture

• Alternative solutions

• Algorithm

• Performance evaluation

• Summary

10

Page 11: An Efficient Backup and Replication of Storage

WalB architecture

A walb device as a wrapper

A block device for data (Data device)

A block device for logs (Log device)

Read Write Logs

Not special format An original format

Any application (File system, DBMS, etc)

Walb log Extractor

Walb dev controller

Control

11

Page 12: An Efficient Backup and Replication of Storage

Log device format

12

Ring buffer Metadata

Address

Superblock (physical block size)

Unused (4KiB)

Log address = (log sequence id) % (ring buffer size) + (ring buffer start address)

Page 13: An Efficient Backup and Replication of Storage

Checksum

Ring buffer inside

13

2nd written data

Ring buffer

Log pack Logpack header block

1st written data …

The oldest logpack The latest logpack

1st log record

IO address IO size

...

Logpack lsid Log pack header block

Num of records

Total IO size

2nd log record

IO address IO size

...

...

Page 14: An Efficient Backup and Replication of Storage

Redo/undo logs

• Redo logs

– to go forward

• Undo logs

– to go backward

14

0 2 Data at t0

0 2

Data at t1

Redo logs

Undo logs

Page 15: An Efficient Backup and Replication of Storage

How to create redo/undo logs

• Undo logs requires additional read IOs

• Walb devices do not generate undo-logs

15

Data Log

WalB

(1) write IOs submitted

(2) read current data

(3) write undo logs

Data Log

WalB

(1) write IOs submitted

(2) write redo logs

Generating redo logs Generating undo logs

Page 16: An Efficient Backup and Replication of Storage

Consistent full backup

• The ring buffer must have enough capacity to store the logs generated from t0 to t1

Primary host Backup host

Walb device (online)

Log device

Full archive

Logs

Apply

(A)

(B)

Read the online volume (inconsistent)

t1

Get consistent image at t1

(C)

(A) (B) (C)

Time

t0 t2

Get logs from t0 to t1

16

Page 17: An Efficient Backup and Replication of Storage

Consistent incremental backup

• To manage multiple backup generations, defer application or generate undo-logs during application

Primary host Backup host

Walb device (online)

Log device

Full archive

Logs

Apply (A)

Get logs from t0 to t1

(B)

(A) (B)

Time

t0 t2 t1

Previous backup timestamp

Backuped Application can be deferred

17

Page 18: An Efficient Backup and Replication of Storage

Backup and replication

18

Primary host Backup host

Walb device (online)

Log device

Full archive

Logs

Apply (A)

Get logs from t0 to t1 Replicated

(A’)

(A)

Time

t1 t3

Remote host

Full archive

Logs

Apply (B’) (B)

(B) t2

Backuped

t0

Replication delay

Page 19: An Efficient Backup and Replication of Storage

Contents

• Motivation

• Architecture

• Alternative solutions

• Algorithm

• Performance evaluation

• Summary

19

Page 20: An Efficient Backup and Replication of Storage

Alternative solutions

• DRBD

– Specialized for replication

– DRBD proxy is required for long-distance replication

• Dm-snap

– Snapshot management using COW (copy-on-write)

– Full-scans are required to get diffs

• Dm-thin

– Snapshot management using COW and reference counters

– Fragmentation is inevitable

20

Page 21: An Efficient Backup and Replication of Storage

Alternatives comparison

WalB

dm-snap

Capability

dm-thin

21

DRBD

Incr. backup

Sync repl-

ication

Async repl-

ication

Performance

Negligible

Search idx

Negligible

Read response overhead

Search idx

Write response overhead

Fragment-ation

Write log instead data

Modify idx (+COW)

Send IOs to slaves

(async repl.)

Modify idx (+COW)

Never

Inevitable

Never

Never (original lv)

Page 22: An Efficient Backup and Replication of Storage

WalB pros and cons

• Pros

– Small response overhead

– Fragmentation never occur

– Logs can be retrieved with sequential scans

• Cons

– 2x bandwidth is required for writes

22

Page 23: An Efficient Backup and Replication of Storage

WalB source code statistics

23

File systems Block device wrappers

Page 24: An Efficient Backup and Replication of Storage

Contents

• Motivation

• Architecture

• Alternative solutions

• Algorithm

• Performance evaluation

• Summary

24

Page 25: An Efficient Backup and Replication of Storage

Requirements to be a block device

• Read/write consistency

– It must read the latest written data

• Storage state uniqueness

– It must replay logs not changing the history

• Durability of flushed data

– It must make flushed write IOs be persistent

• Crash recovery without undo

– It must recover the data using redo-logs only

25

Page 26: An Efficient Backup and Replication of Storage

WalB algorithm

• Two IO processing methods

– Easy: very simple, large overhead

– Fast: a bit complex, small overhead

• Overlapped IO serialization

• Flush/fua for durability

• Crash recovery and checkpointing

26

Page 27: An Efficient Backup and Replication of Storage

IO processing flow (easy algorithm)

27

Submitted

Time

Completed

Packed

Log submitted Log completed

Wait for log flushed and overlapped IOs done

Data submitted Data completed

Write

Submitted

Time

Completed

Data submitted Data completed

Read

Log IO response Data IO response

WalB write IO response

Data IO response

Page 28: An Efficient Backup and Replication of Storage

IO processing flow (fast algorithm)

28

Submitted

Time

Completed

Packed

Log submitted Log completed

Wait for log flushed and overlapped IOs done

Data submitted Data completed

Log IO response Data IO response

WalB write IO response

Pdata inserted Pdata deleted

Write

Submitted

Time

Completed

(Data submitted) (Data completed)

Read

Pdata copied

Data IO response

Page 29: An Efficient Backup and Replication of Storage

Pending data

• A red-black tree

– provided as a kernel library

• Sorted by IO address

– to find overlapped IOs quickly

• A spinlock

– for exclusive accesses

29

Node0

addr size data

NodeN

addr size data

Node1

addr size data

...

...

Page 30: An Efficient Backup and Replication of Storage

Overlapped IO serialization

• Required for storage state uniqueness

• Oldata (overlapped data)

– similar to pdata

– counter for each IO

– FIFO constraint

31

Time

Wait for overlapped IOs done

Data submitted Data completed

Data IO response

Oldata inserted Oldata deleted Got notice Sent notice

Page 31: An Efficient Backup and Replication of Storage

Flush/fua for durability

• The condition for a log to be persistent

– All logs before the log and itself are persistent

• Neither FLUSH nor FUA is required for data device IOs

– Data device persistence will be guaranteed by checkpointing

34

Property to guarantee

All write IOs submitted before the flush IO are

persistent REQ_FLUSH

What WalB need to do

The fua IO is persistent REQ_FUA

Set FLUSH flag of the corresponding log IOs

Set FLUSH and FUA flags to the corresponding log IOs

Page 32: An Efficient Backup and Replication of Storage

Crash recovery and checkpointing

• Crash recovery

– Crash will make recent write IOs not be persistent in the data device

– Generate write IOs from recent logs and execute them in the data device

• Checkpointing

– Sync the data device and update the superblock periodically

– The superblock contains recent logs information

35

Page 33: An Efficient Backup and Replication of Storage

Contents

• Motivation

• Architecture

• Alternative solutions

• Algorithm

• Performance evaluation

• Summary

36

Page 34: An Efficient Backup and Replication of Storage

Experimental environment

• Host

– CPU: Intel core i7 3930K

– Memory: DDR3-10600 32GB (8GBx4)

– OS: Ubuntu 12.04 x86_64

– Kernel: Custom build 3.2.24

• Storage HBA

– Intel X79 Internal SATA controller

– Using SATA2 interfaces for the experiment

37

Page 35: An Efficient Backup and Replication of Storage

Benchmark software

• Self-developed IO benchmark for block devices

– https://github.com/starpos/ioreth

– uses DIRECT_IO to eliminate buffer cache effects

• Parameters

– Pattern: Random/sequential

– Mode: Read/write

– Block size: 512B, 4KB, 32KB, 256KB

– Concurrency: 1-32

38

Page 36: An Efficient Backup and Replication of Storage

Target storage devices

• MEM

– Memory block devices (self-implemented)

• HDD

– Seagate Barracuda 500GB (ST500DM002)

– Up to 140MB/s

• SSD

– Intel 330 60GB (SSDSC2CT060A3)

– Up to 250MB/s with SATA2

39

Page 37: An Efficient Backup and Replication of Storage

Storage settings

• Baseline

– A raw storage device

• Wrap

– Self-developed simple wrapper

– Request/bio interfaces

• WalB

– Easy/easy-ol/fast/fast-ol

– Request/bio interfaces

40

Log Data

Data

Baseline

WalB driver

WalB

Data

Wrap

Wrap driver

Page 38: An Efficient Backup and Replication of Storage

WalB parameters

• Log-flush-disabled experiments

– Target storage: MEMs/HDDs/SSDs

– Pdata size: 4-128MiB

• Log-flush-enabled experiments

– Target storage: HDDs/SSDs

– Pdata size: 4-64MiB

– Flush interval size: 4-32MiB

– Flush interval period: 10ms, 100ms, 1s

41

Page 39: An Efficient Backup and Replication of Storage

MEM 512B random: response

42

Read Write

• Smaller overhead with bio interface than with request interface

• Serializing write IOs of WalB seems to enlarge response time as number of threads increases

Page 40: An Efficient Backup and Replication of Storage

MEM 512B random: IOPS

43

Read Write

• Large overhead with request interface

• Pdata search seems overhead of walb-bio

• IOPS of walb decreases as num of threads increases due to decrease of cache-hit ratio or increase of spinlock wait time

Queue length Queue length

Page 41: An Efficient Backup and Replication of Storage

Pdata and oldata overhead

44

MEM 512B random write

Kernel 3.2.24, walb req prototype

Overhead of oldata

Overhead of pdata

Queue length

Page 42: An Efficient Backup and Replication of Storage

HDD 4KB random: IOPS

45

Read Write

• Negligible overhead • IO scheduling effect was observed especially with walb-req

Queue length Queue length

Page 43: An Efficient Backup and Replication of Storage

HDD 256KB sequential: Bps

46

Read Write

• Request interface is better

• IO size is limit to 32KiB with bio interface

• Additional log header blocks decrease throughput

Queue length Queue length

Page 44: An Efficient Backup and Replication of Storage

SSD 4KB random: IOPS

47

Read Write

• WalB performance is almost the same as that of wrap req

• Fast algo. is better then easy algo.

• IOPS overhead is large with smaller number of threads

Queue length Queue length

Page 45: An Efficient Backup and Replication of Storage

SSD 4KB random: IOPS (two partitions in a SSD)

48

Read Write

• Almost the same result as with two SSD drives

• A half throughput was observed

• Bandwidth of a SSD is the bottleneck

Queue length Queue length

Page 46: An Efficient Backup and Replication of Storage

SSD 256KB sequential: Bps

49

Read Write

• Non-negligible overhead was observed with queue length 1

• Fast is better than easy

• Larger overhead of walb with smaller queue length

Queue length Queue length

Page 47: An Efficient Backup and Replication of Storage

Log-flush effects with HDDs

50

IOPS (4KB random write) Bps (256KB sequential write)

• IO sorting for the data device is effective for random writes

• Enough memory for pdata is required to minimize the log-flush overhead

Queue length Queue length

Page 48: An Efficient Backup and Replication of Storage

Log-flush effects with SSDs

51

IOPS (4KB random write) Bps (32KB sequential write)

• Small flush interval is better

• Log flush increases IOPS with single-thread

• Log-flush effect is negligible

Queue length Queue length

Page 49: An Efficient Backup and Replication of Storage

Evaluation summary

• WalB overhead

– Non-negligible for writes with small concurrency

– Log-flush overhead is large with HDDs, negligible with SSDs

• Request vs bio interface

– bio is better except for workloads with large IO size

• Easy vs fast algorithm

– Fast is better

52

Page 50: An Efficient Backup and Replication of Storage

Contents

• Motivation

• Architecture

• Alternative solutions

• Algorithm

• Performance evaluation

• Summary

53

Page 51: An Efficient Backup and Replication of Storage

WalB summary

• A wrapper block device driver for

– incremental backup

– asynchronous replication

• Small performance overhead with

– No persistent indexes

– No undo-logs

– No fragmentation

54

Page 52: An Efficient Backup and Replication of Storage

Current status

• Version 1.0

– For Linux kernel 3.2+ and x86_64 architecture

– Userland tools are minimal

• Improve userland tools

– Faster extraction/application of logs

– Logical/physical compression

– Backup/replication managers

• Submit kernel patches

55

Page 53: An Efficient Backup and Replication of Storage

Future work

• Add all-zero flag to the log record format

– to avoid all-zero blocks storing to the log device

• Add bitmap management

– to avoid full-scans in ring buffer overflow

• (Support snapshot access)

– by implementing pluggable persistent address indexes

• (Support thin provisioning)

– if a clever defragmentation algorithm was available

56

Page 54: An Efficient Backup and Replication of Storage

Thank you for your attention!

• GitHub repository:

– https://github.com/starpos/walb/

• Contact to me:

– Email: hoshino AT labs.cybozu.co.jp

– Twitter: @starpoz (hashtag: #walbdev)

57