Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end...

19
© 2020 Arm Limited (or its affiliates) Richael Zhuang arm Embrace high performance storage with open arm

Transcript of Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end...

Page 1: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

© 2020 Arm Limited (or its affiliates)

Richael Zhuangarm

Embrace high performance storage

with open arm

Page 2: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

2 © 2020 Arm Limited (or its affiliates)

What’s SPDK

• Storage Performance Development Kit

• A set of tools and libraries to create high performance , scalable, user mode storage applications

Page 3: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

3 © 2020 Arm Limited (or its affiliates)

What’s SPDK

• Key techniques• User mode driver(uio/vfio)• Poll mode instead of interrupt• Shared-nothing thread model

Page 4: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

4 © 2020 Arm Limited (or its affiliates)

SPDK on Arm64

• 50+ patches to enable and optimize SPDK are merged • Memory barrier• Base64,crc32,isa-l

• Enable SPDK NVMe over Fabrics• RDMA• TCP(posix , uring ,vpp, mtcp)

• Enable SPDK vhost target(for VM)

• SPDK-CSI (for container)

Page 5: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

5 © 2020 Arm Limited (or its affiliates)

NVMe over Fabrics

• Local access :pcie (shared memory)• NVME:Specification for SSD access via PCI Express

(PCIe)

• Remote access:message based transport• fibre channel/RDMA/TCP

NVMe host Driver

Host

Admin Queue IO Queue IO Queue

CPU core 0 CPU core n

Memory PCIe Registers Fabric Capsule Operations

Transport-dependent interface

NVME Controller

SQ CQ SQ CQ SQ CQ

Page 6: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

6 © 2020 Arm Limited (or its affiliates)

NVMe over RDMA• RDMA

• host-offload, host-bypass (RNICs)• Queue Pairs (QPs=SQ+RQ) and Completion Queues (CQs)

• NVMe over RDMA• Each NVMe qpair mapped to a RDMA qpair• Retain NVMe SQ/CQ CPU alignment • NVMe commands, encapsuled,put into RDMA qpairs,sent over

RNICs

NVMe host Driver

Host

Admin Queue IO Queue IO Queue

CPU core 0 CPU core n

SQ CQ SQ CQ SQ CQ

RDMA fabric context

SQ

CQ

RQ

RDMA fabric context

SQ

CQ

RQ

RDMA fabric context

SQ

CQ

RQ

QP QP QP

Page 7: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

7 © 2020 Arm Limited (or its affiliates)

NVMe over RDMA performance

1099

1327

1134

1691

1147

1724

1123

1762

1137

1714

1133

1722

0

200

400

600

800

1000

1200

1400

1600

1800

2000

randwrite randread randwrite randread randwrite randread

1core 2core 4core

ban

dw

idth

(MiB

/s)

number of core

NVMe over RDMA & pcie performance

1NVME RDMA 1NVME local pcie

• 1 NVMe750 in target

• MLX5 NICs(ROCE2)

• 4KB payload size,128 queue depth

Page 8: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

8 © 2020 Arm Limited (or its affiliates)

NVMe over TCP

Host-side NVMe-TCP

transport

Receive RSP Capsule

Send CMD Capsule

Controller-side NVMe-TCP transport

Send RSP Capsule

Receive CMD Capsule

Socket APIs

send(bytes) receive(bytes)

Socket APIs

send(bytes) receive(bytes)

NVMe SQ

NVMe CQ

TCP transport TCP connection

IP Network

PhysicalNetwork

Network(ex: Ethernet)

TCP transport

IP Network

PhysicalNetwork

NVMe-oFNVMe-TCP

layer

Typical TCP Network

Stack

• NVMe block storage protocol over standard TCP/IP transport

• TCP provides a reliable transport layer for NVMe queueing model

• Each NVMe queue pair mapped to a TCP connection

• NVMe-OF Commands sent over standard TCP/IP sockets

Page 9: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

9 © 2020 Arm Limited (or its affiliates)

NVMe over TCP in SPDK• POSIX (released,stable, no dependency on kernel)

• Uring (released, experimental, Linux kernel > 5.4.3)• io_uring : a new Linux asynchronous I/O interface

• VPP (released,VPP integration test will be stopped in 20.07)• vector packet processing (VPP) : a fast network data plane on top of DPDK

• Seastar (some work done,but not ready,need further investigation)• an event-driven framework

Sock Abstraction

POSIX uring VPP seastar

Page 10: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

10 © 2020 Arm Limited (or its affiliates)

NVMe over TCP performance

• 1 NVMe P4600 in target

• 4KB payload size,128 queue depth

576

481

1096934

1443

1752

1534

2212

651

480

1080

940

1482

1730

1528

2229

0

500

1000

1500

2000

2500

randwrite randread randwrite randread randwrite randread randwrite randread

1core 2core 4core 8core

ban

dw

idth

(MiB

/s)

number of core

SPDK NVMe over TCP performance

posix uring

Page 11: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

11 © 2020 Arm Limited (or its affiliates)

SPDK & VMs

• Virtio• IO paravirtualization specification• abstraction layer above a set of common emulated

devices in a paravirtualized hypervisor• common mechanism and layouts for device

discovery/configuration• common mechanism for front-end and back-end

to communicate

• Vhost• virtio offloads part of operations to host (kernel or

user mode)• vhost-kernel

– vhost module in kernel transfer data with guest

• Vhost-user– Vhost backend in user space transfer data with guest

Guest VM

(Linux*,Windows*,FreeBSD*,etc)

Virtio front-end drivers

Hypervisor(i.e. QEMU/KVM)

device emulation

Virtio back-end drivers

Virtqueue

Guest VM

(Linux*,Windows*,FreeBSD*,etc)

Virtio front-end drivers

Hypervisor(i.e. QEMU/KVM)

device emulation

Virtio back-end drivers

Virtqueue

vhostVhost target

(kernel or userspace)

vhost

Page 12: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

12 © 2020 Arm Limited (or its affiliates)

SPDK & VMs

• SPDK vhost target• Leverage vhost-user protocol to Provide

the backend storage for VM

• Vhost-user slave• VM shares hugepage memory with a

userspace process(SPDK)• SPDK transfers data with VM through

virtqueue (data path)• use unix domain socket to transfer

control message between processes (control path)

• Vhost-scsi/vhost-blk/vhost-nvme(experimental)

Page 13: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

13 © 2020 Arm Limited (or its affiliates)

SPDK & container

• Container Storage Interface (CSI)• a standard for exposing arbitrary block and file storage systems to containerized workloads on

Container Orchestration Systems (COs) like Kubernetes.• NOTE: CSI is a general protocol, not for Kubernetes only

• component• Controller Driver

– Talk to Service Provider (SP) to create/delete volumes

• Node Driver– Mount/unmount remote volumes to local host

COK8s, Mesos …

Master Node

Controller Driver

Worker Node

Node Driver

Worker Node

Node Driver

Worker Node

Node Driver~ Controller driver on CO master node

~ Node driver instances per CO worker

~ CO talks to CSI Drivers with CSI

RPC messages

Page 14: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

14 © 2020 Arm Limited (or its affiliates)

SPDK-CSI

• Kubernetes supports CSI well• CSI spec 1.0 supported since Kubernetes 1.13

• Kubernetes CSI Drivers List

• SPDK-CSI: Bring SPDK to Kubernetes• Bring SPDK to Kubernetes storage through NVMe-oF, iSCSI• Supports dynamic volume provisioning• Enables Pods to use SPDK for transient or persistent storage• released in 20.07, initiated by Arm

Page 15: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

15 © 2020 Arm Limited (or its affiliates)

SPDK-CSI overview

gRPC

Page 16: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

16 © 2020 Arm Limited (or its affiliates)

What’s the next?

• NVMe over Fabrics• Integrate and optimize SPDK NVMe over TCP with mTCP as the user space TCP stack• Optimize SPDK NVMe over TCP with uring socket

• SPDK CSI• Tests and improvements for production level quality• New features

– Topology, volume expansion, snapshot, etc.– See Backlogs and Todos at Trello Board

• Integration with Rook– Build a total solution of leveraging SPDK in Kubernetes

Page 17: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

17 © 2020 Arm Limited (or its affiliates)

Welcome Contribution• Code review at SPDK Gerrit

• git clone https://review.spdk.io/spdk/spdk-csi• Github mirror: https://github.com/spdk/spdk-csi

• Development Guidelines• https://spdk.io/development/

• Trello Board• https://trello.com/b/nBujJzya/kubernetes-integration

Page 18: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

18 © 2020 Arm Limited (or its affiliates)

Q&A

[email protected]

Page 19: Embrace storage with open arm · 2020. 12. 22. · (Linux*,Windows*,FreeBSD*,etc) Virtio front-end drivers Hypervisor(i.e. QEMU/KVM) device emulation Virtio back-end drivers Virtqueue

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in

the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks

© 2020 Arm Limited (or its affiliates)