Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving...

30
Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain EMC Corp (Now Dell EMC)

Transcript of Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving...

Page 1: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Improving Copy-on-Write Performance in Container Storage Drivers

Frank Zhao*, Kevin Xu, Randy Shain EMC Corp (Now Dell EMC)

Page 2: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Disclaimer

THE TECHNOLOGY CONCEPTS BEING DISCUSSED AND DEMONSTRATED ARE THE RESULT OF RESEARCH CONDUCTED BY THE ADVANCED RESEARCH & DEVELOPMENT (ARD) TEAM FROM THE EMC OFFICE OF THE CTO. ANY DEMONSTRATED CAPABILITY IS ONLY FOR RESEARCH PURPOSE AND AT A PROTOTYPE PHASE, THEREFORE : THERE ARE NO IMMEDIATE PLANS NOR INDICATION OF SUCH PLANS FOR PRODUCTIZATION OF THESE CAPABILITIES AT THE TIME OF PRESENTATION. THINGS MAY OR MAY NOT CHANGE IN THE FUTURE

2

Page 3: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Outline

Container (Docker) Storage Drivers Copy-on-Write Performance Drawback Our Solution: Data Relationship and Reference Design & Prototyping with DevMapper Test Results

Launch storm Data access IO heatmap

Summary and Future Work

3

Page 4: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Container/Docker Image and layers

4

File 2 File 5 File 6

File 1 File 2 File 3

File 1 File 4 File 6

File 1 File 2 File 3 File 4 File 5 File 6

Image and its layers (e.g. Ubuntu 15.04)

L1

L2

L3

Running container’s logical view An image references a list of read-only layers or differences (deltas)

A container begins as a R/W snapshot of the underlying image

I/O in the container causes data to be copied up (CoW) from image

CoW granularity may be file (e.g. AUFS), or block (e.g. DevMapper)

CoW impacts I/O performance

File 4

File 2

Page 5: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Current Docker Storage Drivers

Pluggable storage driver design Configurable at start of daemon Shared for all containers/images

Using snapshot and copy-on-write for space efficiency

5

Driver Backing FS Linux distribution

AUFS Any (EXT4, XFS) Ubuntu, Debian

Device Mapper Any (EXT4, XFS) CentOS, RHEL, Fedora

OverlayFS Any(EXT4, XFS) CoreOS

ZFS ZFS (FS+vol, Block CoW) Extra install

Btrfs Btrfs (block CoW) Not very stable, Docker has no commercial support

Page 6: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Recommendations From Docker

NO single driver is well suited to every use-case Considering: Stability, Expertise, Future-proofing

“Use the default driver for your distribution”

6

https://docs.docker.com/engine/userguide/storagedriver/selectadriver/

Page 7: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Deployment in Practical Environments

Likely DM & AUFS are the most widely used Mature, stable

DM is in kernel

Expertise Widely available

CentOS, RHEL, Fedora Ubuntu, Debian …

7

Page 8: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DevMapper (dm-thin) AUFS

Lookup Good: Single layer (Btree split) Bad: Cross-layer traverse

I/O Good: Block level CoW Bad: Can’t share (page) cache

Good: Share page cache Bad: File level CoW

Memory efficiency

Bad: Multiple data copies Good: Single/shared data copy

CoW Performance Penalty DM/AUFS as example

8

Page 9: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

CoW Performance Penalty Example

9

Initial copy-up from image to container adds I/O latency Sibling containers suffers copy-up again even accessing the same data

First read by C1 penalized by CoW Re-read of the same data by C1 satisfied from C1’s page cache (20X faster) First read by C2 penalized again due to DM CoW

118

2500

0

500

1000

1500

2000

2500

3000

C1.first read C1.re-read

C1: Read MB/s

120

2500

0

500

1000

1500

2000

2500

3000

C2 .first read C2. re-read

C2: Read MB/s

(CentOS, HDD, DM-thin)

MB/s MB/s

Page 10: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Our Goals Target dense containers(snaps) environment

Multiple containers from same image, similar workload

Improve common CoW performance Speedup cross-layer lookup (akin to AUFS) Reduce disk IO (akin to DM) Future, facilitate single mem copy

Software atop of existing CoW infrastructure NO extra data cache: but efficient metadata Cross-container: C1C2, C2C3, … Cross-image: e.g. between Ubuntu and Tensorflow

As long as derived from the same base layer Good scalability

10

Page 11: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR: Data Relationship & Reference

11

A metadata describes latest ownership of data Layer relationship:

A tree for base/snap relationship, R/W containers reside at leaf Lookup layer tree: when driver indicates data is shared Update layer tree: when create/delete a layer

E.g. image pull, commit changes, or delete image Data access record: tracks latest IO on shared data

Record: {LBN, Len, srcLayer, targetLayer, pageBitmap}

DRR common operations: Add/update: add new record when IO done on shared data Lookup: for new IO (read, small write) on shared data to

reference from recent valid page cache, instead of disk Remove: write IO invalidates and removes the record since data

is now owned by that container

*Research project (i.e. not mature enough for production)

Page 12: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

C3 C3

DRR I/O Diagram

1. C1 first time accesses file data, loads from disk 2. Adds record to DRR metadata after IO 3. Return data to C1 4. C2 to access the same data 5. Look up DRR metadata for valid reference 6. Memory copy data from C1 page cache via normal

file interface at corresponding offset, update DRR to reflect most recent reference 12

C1 C2

CoW infrastructure (DevMapper, AUFS, ….)

File-A

DRR metadata 1

Global Shared DRR per host (logically per snapFamily)

C3 4 5

2

3 6

File-A More containers Kernel

Page 13: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Linux DM-thin

One of DM target drivers, in kernel for 5Y, mature enough Shared block pool for multiple thin devices Internally single layer (metadata CoW) so no traverse Nice support many snaps in depth! (snap-of-snap)

Metadata (Btree) and data CoW Granularity: default 64KB chunk

13

FS-LBN

PBN (in 64KB)

2-level Btree

Thin Pool

Base Dev(ID=1)

ThinDev2 ThinDev5

Dev3

Dev4 Dev Tree

Block Tree

Dev-ID

Page 14: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR Key Structures: LayerTree, AccessTable

14

Access_Table (LBN -> recent access entry)

Hash

bio.LBN

Chunk1

Versions & Ref

OwnerDevID refDevID[] pageRefList state

Layer_Tree (hierarchy)

DevID1

ID2

ID4 ID5

ID7 ID8

ID6

ID3

ID9 ID10

PoolBase (one per host)

Ubuntu Tensorflow

SnapFamily1

RO layer

R/W layer

Chunk2 Chunk

OwnerDevID refDevID[] pageRefList state

Page 15: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR with DM-Thin

Transparent to Docker/App Kernel module, between Docker/FS and DM-thin

No change to core DM-thin CoW essential Leverage existing metadata, hierarchy, addressing … Determine block is shared or not. dm_thin_find_block()

Break block sharing: break_sharing()

File data, and shared block only

15

DevID LBN page

address_space *mapping fileOffset inode *host

BIO page Address_space

Actual-PBN (40bit) CTime(24bit)

Current Btree Mapping

Denote owner dev; 1:1 w/ DevID

Page 16: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

New I/O Flow with DRR App, FileSystem

(XFS, Ext4)

DRR + DM-thin bio(dev, LBN, page )

Lookup Btree

Shared Blk? && (Read || small-write)

Lookup DRR (In: DevID, LBN)

Memcpy from recent page cache

(file ino, offset from current bio, clean)

Y

Hit Disk IO

Add DRR entry (LBN, curDevID,

ownerDevID)

N

Miss

Remove DRR entry

RD WR

Break-sharing

16

Page 17: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR: Preliminary Benchmark of PoC

17

Kernel driver integrated w/ DM-thin 1800+ LOC, No Docker code change

Env: CentOS7, Docker1.10, DM-thin (64KB chunk), Ext4

HDD based system E5410 @ 2,33GHz, 8 cores, 16GB DRAM 600GB SATA disk configured as LVM-direct CentOS7, kernel 3.10.0-327

PCIe SSD based system E5-2665 @ 2.40GHz, 32 cores, 48 GB DRAM OCZ PCIe SSD RM84 (1.2TB) as LVM-direct CentOS7, Kernel 3.10.229

Page 18: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR: Launch Storm Test (HDD)

18

Launch TensorFlow containers Each TensorFlow container needs to load ~104MB trained

model & parameters Results: up to 47% faster with DRR

6 13 24

49

98

200

5 7 14 26

66

142

0

50

100

150

200

2 4 8 16 32 64

Seco

nds

Number of containers

Baseline DRRTime (Sec)

Page 19: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Container Launch Internals

Steps to start a container 1. Create new thin dev: DM-thin locking/serialization 2. Mount Ext4 FS Out of current DRR scope

FS metadata blocks, not file data 3. Binary file and libraries: 4. Config file, parameters data …:

19

Handled through DRR

Page 20: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

5%

2%

0%

2%

4%

6%

8%

10%

12%

0

500

1000

1500

2000

2500

C4 C4.re-read

118

1100 1100

10%

5% 5%

0%

2%

4%

6%

8%

10%

12%

0

500

1000

1500

2000

2500

C1 C2 C3

DRR: Container I/O Test (HDD)

20

10X faster (~120MB/s -> 1.2GB/s) Launch C1, C2 & C3 in order; then rm C1 and launch C4 Issue same IO to read image big file on each container

local page cache hit gets 2.5GB/s (1.2/2.5 = 48%, CPU: 2%)

1200

4 1 1 2 3 warmup

2500

PC hit

MB/s CPU

rm C1

Read I/O BW & CPU

20

Page 21: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR: Computing + IO test (HDD)

2.7X faster in execution time: md5sum <big file> Higher CPU (+4% vs. local PC hit, due to block bio + DRR)

21

4.8 4.8

1.78 1.58 5% 5%

16%

12%

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

0

1

2

3

4

5

6

C1.md5 C2.md5 C3.md5 C3.md5 re-run

Md5sum (Time & CPU)

Baseline DRR off

Time(Sec) CPU

DRR ON Warm up

DRR hit PC hit

Page 22: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

2000

3900

0% 0% 0%

0%

0%

0%

0%

1%

1%

1%

1%

1%

1%

-100

400

900

1400

1900

2400

2900

3400

3900

C4 C4.re-read

840

2000 2000

1%

0% 0% 0%

0%

0%

0%

0%

1%

1%

1%

1%

1%

1%

-100

400

900

1400

1900

2400

2900

3400

3900

C1 C2 C3

DRR: Container I/O Test (PCIe SSD)

238% faster (840MB/s -> 2.0GB/s) Same as HDD test steps; Local PC hit is 3.9GB/s (2.0/3.9= 51%)

22 4 1 1 2 3

rm C1 warmup PC hit

MB/s CPU

22

Read I/O BW & CPU

Page 23: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR: launch storm & md5sum (PCIe SSD)

Slight improvement Non-IO factors i.e., locking, processing etc PCIe SSD read is fast!

23

1.59 1.57 1.53

1

1.1

1.2

1.3

1.4

1.5

1.6

Baseline(SSD) DRR local pagecache

Md5sum Run Time(Sec) Time(Sec)

Baseline DRR Hit PC hit

Page 24: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

DRR: I/O Heatmap utility

24

Heatmap to trace I/O cross-layer, w/ and w/o DRR Stats collected during TensorFlow launches on SSD system

2/3 disk IO reduction

Src -> Tgt Total IO Read

Without DRR

With DRR

24

Page 25: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Discussion-1

DRR CPU footprint: -5% ~ +10% vs. disk I/O Or +3~+4% vs. page cache hit

DRR memory footprint Not related to number of containers (they’re leaf)

but is related to layer depth (shared block versions)

Estimation: hot chunk * versions(layer-depth) * ref# (layer-depth) 50GB image,10% hot; 5~10 depth w/ containers,

then 100~400MB memory If DRR 100% miss: perf -2.5% (HDD) ~ -6.7% (PCIe)

25

Page 26: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Discussion-2

Benefits from dense container locality Many duplicate containers, similar workload/pattern The more running containers, the less shared

footprint Auto warm up after reboot, no big difference Access from most recent one(s), balanced and

recursive C1C2, C2C3, C3C4, …

Benefits small-writes (the “Copy” op) Skip adding DRR entry since data to dirty soon

26

Page 27: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Discussion-3

I/O path Before: C1FS1: page-cache miss -------------->disk Now: C2FS2: page-cache miss ----DRR--->FS1 May lookup DRR before issue BIO to shorten I/O path

DRR Current Implementation: For file data only, page-aligned IO In-mem only; not persistent Next: LRU replacement? Part hit + part miss?

Modify Docker driver API? To facility control and management

27

Page 28: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Outlook

More comprehensive tests and tuning

Prototyping with AUFS Reduce cross-layer lookup

Think about memory deduplication with DM? Offline & Brute force way: KSM (Kernel Same-page

Merge, for VM/VDI) DRR to provide fine-grained source/target memory deduplication

info

Inline & graceful fashion: Global FS data cache Gap to enterprise/commercial FS (global data cache/dedup)

28

Page 29: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Summary

Targeted for dense container deployment environment DRR: a scalable solution to address common CoW

drawback Layer hierarchy, Access history No extra data cache, cross-container locality even cross-image

Good scalability (with layer depth rather than container#)

Transparent, integrates w/ various CoW w/o significant change Promising especially for HDD and IO heavy case

PoC with Linux DM Up to 10X in HDD, 2+X in PCIe SSD Open new possibility

VM/VDI solutions Container era?

29

Page 30: Improving Copy-on-Write Performance in Container Storage Drivers · 2019-12-21 · Improving Copy-on-Write Performance in Container Storage Drivers Frank Zhao*, Kevin Xu, Randy Shain

2016 Storage Developer Conference. © EMC Corp All Rights Reserved.

Thank you! Any further feedbacks, please contact: [email protected] or [email protected]

30