A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in...

21
© PernixData. All rights reserved. 1 Deepavali Bhagwat, Mahesh Patil, Michal Ostrowski, Murali Vilayannur, Woon Jung, Chethan Kumar A Practical Implementation of Clustered Fault Tolerant Write Acceleration in a Virtualized Environment FAST 2015

Transcript of A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in...

Page 1: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 1

Deepavali Bhagwat, Mahesh Patil, Michal Ostrowski, Murali Vilayannur, Woon Jung, Chethan Kumar

A Practical Implementation of Clustered Fault Tolerant Write Acceleration in a Virtualized Environment

FAST 2015

Page 2: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 2

I/O Acceleration in Virtualized Datacenters

▪ As virtualized datacenters scale, they experience I/O bottlenecks▪ Virtual Machine (VM) density increases▪ Cumulative VM I/O increases▪ J. Shafer, “I/O Virtualization Bottlenecks in Cloud Computing Today,” WIOV’10

▪ Provisioning better/more storage▪ Disruptive, temporary fix▪ Buying capacity to solve performance

▪ Host-side flash ▪ Locate application working set at the beginning of the I/O path to accelerate I/O ▪ A non-disruptive approach

Page 3: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 3

Host-side Flash for Accelerating I/O

▪ Host-side flash to accelerate reads alone (Write-Through)▪ Reads are issued first to flash and to SAN on a cache miss▪ Writes issued to flash and SAN and acknowledged after SAN completion▪ VM writes experience SAN latencies▪Byan etal.,“Mercury: Host-side flash caching for the data center,” Mass Storage ’12▪Qin etal.,“Reliable write-back for client-side flash caches,” USENIX ATC ’14

▪ Host-side flash to accelerate reads and writes (Write-Back)▪ Reads are cached as in Write-Through▪ Writes issued to flash only and acknowledged after flash completion▪ Writes periodically flushed to the SAN▪ VM writes experience flash latencies▪Holland etal., “Flash caching on the storage client,” USENIX ATC ’13

Page 4: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 4

Benefits of Using Host-side Flash for Write Acceleration

Cumulative Throughput of two VMs:• Microsoft Exchange Server JetStress: 32K random reads/writes, 14K sequential writes• fio: 64K sequential writes

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600

MByt

es W

ritten

/Sec

Time (sec)

wb wt

Page 5: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 5

Challenges using Host-side Flash for Write Acceleration

▪ Preserving VM mobility in virtualized environments▪Resource Management▪Power Management▪High Availability

▪ Sustained writes

▪ Fault Tolerance

▪ We introduce FVP:▪Seamless VM migration▪Flow Control to gracefully handle sustained writes▪Fault Tolerance by replicating VM writes to peers

Page 6: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 6

Acceleration Policies

▪ Write-Through (wt)

▪ Write-Back (wb)

▪ Write-Back with Peers (wbp)▪ Writes issued to flash and to peer hosts

▪ Acknowledged to the VM after flash and peer completion▪ VM writes experience MAX(network, flash) latencies

Page 7: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 7

The Destager

▪Dirty VM writes periodically flushed to the SAN

▪Each write is marked with a monotonically increasing serial number at entry

▪Writes are batched▪Writes in a batch do not overlap▪Writes in a batch are issued concurrently to the SAN

▪A checkpoint record is persisted to the flash after every write in a batch is complete

▪To allow for fair share of SAN bandwidth▪The destager cycles through VM write batches▪The batch size is capped

VMVM

FlashFlash

DestagerDestager

SANSAN

VM writes

CheckpointBatch of non-overlappingwrites

Page 8: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 8

What is a Checkpoint?

▪ A checkpoint consists of▪ Serial number▪ VM UUID

▪ For wbp VMs, the checkpoint is transmitted to peers.

▪ The primary and peer host persist checkpoint onto their host-side flash

▪ The primary and peer hosts evict writes whoseSerial number <= Checkpoint(serial number)

▪ Checkpoints reduce recovery time

Page 9: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 9

Seamless VM Migration

▪Every host has access to every other host’s flash▪After migration VM reads are transmitted to the previous host▪The new host builds up the VM’s footprint▪VM reads experience (network + flash) latencies▪After a certain period of time or after a certain number of cache misses, the new

host stops transmitting reads to the previous host

Page 10: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 10

Flow Control

▪ Sustained writes fill up flash space allocated to a VM

▪ VM has to be stalled, experiences degraded performance

▪ Flow Control: ▪ Slow down VM to allow the destager to catch up▪ Avoid VM stall

▪ In the worst case, VM experiences SAN latency (which they would without FVP)

▪ FVP uses heuristics to trigger flow control

▪ Most applications are bursty, short write bursts gracefully absorbed

Page 11: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 11

Fault Tolerance with Host-side flash

▪ In case of failure, SAN is in an inconsistent state with respect to the VM

▪ Affected VMs could be migrated to other hosts for High Availability

▪ Data corruption possible

▪ On-disk locks prevent hosts from corrupting VM data on the SAN▪A lock for every VM, ownership arbitrated by the Virtual Machine File System▪Only one host can acquire lock. That host is eligible to issue I/Os on behalf of the VM▪Locks are persisted on the VM’s datastore

▪The lock contents▪ VM’s last checkpoint (serial number, VM UUID)▪ VM acceleration policy

▪ A copy of the lock file is kept in a read only lock file

Page 12: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 12

Fault Tolerance

▪ Flash Failure▪ Host Failure▪ Network Failure▪ SAN Failure

Page 13: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 13

Flash Failure

▪ Host relinquishes locks for affected VMs

▪ VM in wt: No data loss

▪VM in wb: Recovery not possible. VM stalled.

▪VM in wbp: VM stalled. Recovery via Online Replay▪ Peers periodically attempt to acquire locks. One peer succeeds.▪ The successful peer destages replicated writes from the last checkpoint.▪ Releases lock

▪ Migrated wb/wbp VM (HA)▪ New host acquires lock▪ Detects VM wb/wbp policy▪ Infers dirty writes still pending▪ Stalls the VMs▪ Releases locks▪ Periodically polls read only lock for waiting for a peer to complete online replay

Page 14: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 14

Host Failure

▪ Virtual Machine File System releases locks as part of failure detection

▪ VM in wt: No data loss

▪ VM in wb via Offline Replay▪On recovery, host acquires locks▪FVP scans the flash device and destages all writes after the last checkpoint▪Checkpoint is regularly updated in the lock file▪VMs kept stalled until their writes are destaged

▪ VM in wbp via Online Replay

▪ Migrated wb/wbp VM: New host stalls the VMs until replay complete

Page 15: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 17

Cascading Failures, Distributed Recovery

▪Checkpoints persisted regularly to speed up recovery from cascading failures.

▪FVP can recover from up to p flash failures.

▪FVP can recover from multiple host failures. All metadata for writes + checkpoint persisted to flash.

▪Recovery is distributed.

Page 16: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 18

Experimental Evaluation

▪VM migration

▪How FVP absorbs short write burst.

▪How FVP uses flow control to handle sustained write bursts.

▪Fault Tolerance, Cost vs. Benefits.

Page 17: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 20

VM Migration

▪ Random 4K reads (Iometer) Latency reduces as cache hits increase After VM migration

Remote reads incur additional network latency New host build VM footprint. Latencies reduce as cache hits increase on new host.

0

1

2

3

4

5

6

7

0 200 400 600 800 1000 1200 1400

Ave

rage

Lat

ency

/Rea

d (m

s)

Time (sec)

VM Migration

Cache Warm up

Cache Hits: 100%

Remote Reads, Host 2

Page 18: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 21

0

5

10

15

20

25

30

35

0 100 200 300 400 500 600

Wri

tes (1

000)

/Sec

Time (s)

(a) Writes/sec

VM (FVP) Destager (SAN)

Sustained Write Bursts

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600

Obs

erve

d A

vera

ge L

aten

cy/W

rite

(m

s)

Time (s)

(b) Write Latency

VM, wb Flash device

VMVM

FlashFlash

DestagerDestager

SANSAN

Writes acknowledged by Flash (Flash latencies)

Writes acknowledged by SAN (SAN latencies)

Page 19: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 22

Fault Tolerance Cost vs. Benefits

▪ Two VMs: Microsoft Exchange Server JetStress (reads and writes) and Ubuntu (writes)▪ Replicating writes across peers incurs additional network latencies▪ Peering reduces VM throughput, but protects against failures▪ Throughput with peers is still better than only wt. ▪ Peering = fault tolerance + better acceleration than wt

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600

MByt

es W

ritten

/Sec

Time (sec)

wtwb

wbp (p=1)wbp (p=2)

Page 20: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 23

Conclusions and Future Work

▪ FVP achieves seamless fault tolerant write back acceleration while preserving VM mobility (DRS, HA)

▪ Absorbs short write burst ▪ Masks VMs from SAN latency spikes▪ Preserves VM performance predictability to help deliver on SLA objectives

▪ FVP handles sustained write bursts gracefully using Flow Control

▪ Future work: Building more intelligence/adaptability into FVP

Page 21: A Practical Implementation of Clustered Fault Tolerant ... · Fault Tolerant Write Acceleration in a Virtualized ... Virtual Machine (VM) density increases Cumulative VM I/O increases

© PernixData. All rights reserved. 24

Deepavali Bhagwat, Mahesh Patil, Michal Ostrowski, Murali Vilayannur, Woon Jung, Chethan Kumar

A Practical Implementation of Clustered Fault Tolerant Write Acceleration in a Virtualized Environment

FAST 2015