Autonomous NIC Offload

34
Boris Pismenny, Yoray Zack, Ben Ben-Ishay and Or Gerlitz AUTONOMOUS NVME-TCP OFFLOAD 1

Transcript of Autonomous NIC Offload

Page 1: Autonomous NIC Offload

Boris Pismenny, Yoray Zack, Ben Ben-Ishay and Or Gerlitz

AUTONOMOUSNVME-TCP OFFLOAD

1

Page 2: Autonomous NIC Offload

Overview

• Motivation

• Storage protocol offload

• Seamless integration

• APIs and implementation

2

Page 3: Autonomous NIC Offload

Motivation: offload opportunities

• Transmit side data checksum calculation• PDU data CRC calculation

• Receive side data checksum validation• PDU data CRC verification

• Receive side copy• Need to place data at destination buffers

• But TCP receives data in anonymous unaligned buffers

• Data is copied from TCP to destination buffers

Data

CRC

Header

PDU

3

Page 4: Autonomous NIC Offload

Motivation: offload opportunities

• Copy and CRC consume up to 50% per IO cycles

4

Page 5: Autonomous NIC Offload

Motivation: NVMe out-of-order processing

• Generic zerocopy receive does not work

• NVMe supports reordering of storage read/write operations

TargetInitiator

Problem:Generic zerocopyreceive writes to

wrong buffer

To solve this problem, we need upper layer protocol awareness!

A

B

Reordering of reads

5

Page 6: Autonomous NIC Offload

Transmit offload overview

NVMe-TCP PDU header/trailer

TCP header

⇐crc

Baseline

H H data T

H H data T

H data T

data

H H data T

App

TCP/IP

NIC

NVMe-TCP

Network

CP

UN

IC

6

Page 7: Autonomous NIC Offload

Transmit offload overview

NVMe-TCP PDU header/trailer

TCP header

Baseline

H H data T

H H data T

H data T

data

H H data T

App

TCP/IP

NIC

NVMe-TCP

Network

Offload

H data

data

H H data 0

H H data 0

⇐crc

0

App

TCP/IP

NIC

NVMe-TCP

NetworkH H data T

CP

UN

ICC

PU

NIC

Offload CRC

7

⇐crc

Page 8: Autonomous NIC Offload

Receive offload overview

NVMe-TCP PDU header/trailer

TCP header

⇐copy+crc

Baseline

H H data T

⇐DMA

data

Choose

H H data T

App

TCP/IP

NIC

NVMe-TCP

Network

CP

UN

IC

data

H data T

8

Page 9: Autonomous NIC Offload

⇐copy+crc

Receive offload overview

NVMe-TCP PDU header/trailer

TCP header

H data T

H H data T

Offload

⇐DMA

H T

H H T

⇐DMA+crc

data

CombineDMA+copy+crc

Choose

H H data TH H data T

App

TCP/IP

NIC

NVMe-TCP

Network

CP

UN

IC

App

TCP/IP

NIC

NVMe-TCP

Network

CP

UN

IC

Baseline

data

data data

9

Page 10: Autonomous NIC Offload

Seamless integration: crc

• New SKB bit skb->ddp_crc• Used similarly to TLS’s skb->decrypted

• On transmit skb->ddp_crc indicates CRC offload is expected

• On receive skb->ddp_crc indicates no CRC errors in packet’s payload• skb->ddp_crc==0 triggers software PDU CRC calculation

10

Page 11: Autonomous NIC Offload

Seamless integration: copy

• NIC driver builds SKBs of packets on the wire • Packet headers from receive ring

• Storage protocol headers/trailers from receive ring

• Payload from destination buffers

previous PDU data

Ethernet / IP / TCPno PDU data (not aDMA target buffer)

PDU data: offloadDMA-writes here

previous packets… …

more packets… …

rece

ived

pac

ket

rece

ive

rin

g

Application buffers

CID

=X

11

Page 12: Autonomous NIC Offload

Seamless integration: copy

• NIC driver builds SKBs of packets on the wire • Packet headers from receive ring

• Storage protocol headers/trailers from receive ring

• Payload from destination buffers

previous PDU data

Ethernet / IP / TCPno PDU data (not aDMA target buffer)

PDU data: offloadDMA-writes here

previous packets… …

more packets… …

rece

ived

pac

ket

rece

ive

rin

g

Application buffers

CID

=X

skb_shinfo(skb)

12

Page 13: Autonomous NIC Offload

Seamless integration: copy

• NIC driver builds SKBs of packets on the wire • Packet headers from receive ring

• Storage protocol headers/trailers from receive ring

• Payload from destination buffers

• Storage protocol skips copy• Iff (src == dst) before memcpy

previous PDU data

Ethernet / IP / TCPno PDU data (not aDMA target buffer)

PDU data: offloadDMA-writes here

previous packets… …

more packets… …

rece

ived

pac

ket

rece

ive

rin

g

Application buffers

skb_shinfo(skb) …

CID

=X

13

Page 14: Autonomous NIC Offload

Seamless integration: copy

• Need to avoid network stack copies of data• Problem: skb_coalesce copies data from destination buffer back to SKB

• Solution: Avoid it by reusing the skb->ddp_crc bit

• Need to map between destination pages and their identifiers• Upper layer protocol maintains mapping

14

Page 15: Autonomous NIC Offload

Hardware perspective

15

Page 16: Autonomous NIC Offload

NIC contexts

NIC contexts

Dynamic state• expected TCP seq• current msg offset • current msg size• current msg CID• CRC state

Static state• CID to buffer map• Protocol version• Message format

16

Page 17: Autonomous NIC Offload

Transmit offload in-sequence

• NIC offload Implementation is simple• Incrementally offload using NIC contexts

TCPhdr 1

TCPhdr 2

TCPhdr 3

TCPhdr 4

TCPhdr 5

TCPhdr 6

TCPhdr 7

TCPhdr 8

size size size

NIC contexts

Dynamic state• expected TCP seq• current msg offset • current msg size• current msg CID• CRC state

Static state• CID to buffer map• Protocol version• Message format

17

Page 18: Autonomous NIC Offload

Transmit offload out-of-sequence

• Wrong dynamic NIC context state

• Context recovery needs only the message prefix• Driver can get the prefix from the storage protocol layer

• Reuse TCP transmit buffer for storing data• TCP ACKs release data in storage protocol PDU granularity

18

TCPhdr 1

TCPhdr 2

TCPhdr 3

TCPhdr 4

TCPhdr 5

TCPhdr 6

TCPhdr 7

TCPhdr 8

size size size

NIC contexts

Dynamic state• expected TCP seq• current msg offset • current msg size• current msg CID• CRC state

Static state• CID to buffer map• Protocol version• Message format

Page 19: Autonomous NIC Offload

Receive offload in-sequence

• NIC offload Implementation is simple• Incrementally offload using NIC contexts

• Hardware reports one bit per packet• is packet CRC ok?

19

CRC verified Message header

1 3 5

Page 20: Autonomous NIC Offload

Receive offload retransmission

• Retransmissions bypass offload• Software fallback

20

CRC verified Non-verified message data Message header

1 3 5

Page 21: Autonomous NIC Offload

Receive offload data reordering

• PDU data reordering• Skip hardware to skip to the next record

• Continue offloading

21

1 3 5 6

CRC verified Non-verified message data Message header

Page 22: Autonomous NIC Offload

Receive offload header reordering

• PDU header reordering• Stops hardware NIC offloading

• Software must recover NIC context to continue

22

1 35 6

CRC verified Non-verified message data Message header

Page 23: Autonomous NIC Offload

Receive offload recovery problem

• NIC context recovery on receive is non-trivial:• Stopping packets to recover NIC context is impossible

• Packets keep coming

• Software alone cannot recover during traffic

• Need to combine software and hardware

23

1 35 6

CRC verified Non-verified message data Message header

Page 24: Autonomous NIC Offload

Receive offload recovery solution

NIC context recovery relies on:

(1) Speculatively finding PDU message header magic pattern

24

1 35

CRC verified message data

Non-verified message data

Message header

6 7

Speculative message header

Page 25: Autonomous NIC Offload

Receive offload recovery solution

NIC context recovery relies on:

(1) Speculatively finding PDU message header magic pattern

(2) Requesting software to confirm that this is indeed a PDU header, while

25

1 35 6 7

is it a PDU header?

Decrypted message data

Non-Decrypted message data

Message header

Speculative message header

Page 26: Autonomous NIC Offload

Receive offload recovery solution

NIC context recovery relies on:

(1) Speculatively finding PDU message header magic pattern

(2) Requesting software to confirm that this is indeed a PDU header, while

(3) Tracking subsequent messages using the message header’s length field

26

1 35 6 7

Decrypted message data

Non-Decrypted message data

Message header

Speculative message header

Page 27: Autonomous NIC Offload

Receive offload recovery solution

NIC context recovery relies on:

(1) Speculatively finding PDU message header magic pattern

(2) Requesting software to confirm that this is indeed a PDU header, while

(3) Tracking subsequent messages using the message header’s length field

(4) Resuming offload if software confirms the HW speculation

27

1 35 6 7 8 9

Decrypted message data

Non-Decrypted message data

Message header

Speculative message header

Yes, it was a PDU header in P5

Page 28: Autonomous NIC Offload

APIs and implementation

28

Page 29: Autonomous NIC Offload

ULP DDP infrastructure

• ULP DDP interposes between NIC drivers and storage protocols

• Protocol agnostic

• Vendor agnostic

• First users are NVMe-TCP and Mellanox drivers

ULP-DDP infrastructure

NVMe-TCP ISCSI

Mellanox ???

????

29

Page 30: Autonomous NIC Offload

ULP DDP APIs

• Setup/teardown per-connection state

• Setup/teardown mapping between pages and their identifiers

• Protocol resynchronization

30

Page 31: Autonomous NIC Offload

NVMe-TCP setup per-connection state

• Offload begins after all handshakes complete

• Configure NVME queue limits (max sgl, max IO size, etc.)

Start offload here

31

Page 32: Autonomous NIC Offload

NVMe-TCP mapping pages

• Map buffers before IO send

• Unmap on IO completion• Added asynchronous unmap to improve performance

32

Page 33: Autonomous NIC Offload

Netdev features

• We run out of netdev feature bits!

• Proposal: override __UNUSED_NETIF_F_1• Single bit for both receive and transmit

33

Page 34: Autonomous NIC Offload

Future work

• Integration with TLS• Data-path POC working

• Need a solution for the TLS handshake in NVMe-TCP

34