StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview!...

18
StackMap: LowLatency Networking with the OS Stack and Dedicated NICs Kenichi Yasukata (Keio University*), Michio Honda, Douglas Santry, Lars Eggert (NetApp) June 22nd @ USENIX ATC 2016 © 2016 NetApp, Inc. All rights reserved. 1 *Work while an intern at NetApp

Transcript of StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview!...

Page 1: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

StackMap: Low-­Latency Networking with the OS Stack and Dedicated NICsKenichi Yasukata (Keio University*), MichioHonda, Douglas Santry, Lars Eggert (NetApp)

June 22nd @ USENIX ATC 2016

© 2016 NetApp, Inc. All rights reserved. 1

*Work while an intern at NetApp

Page 2: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Overview

§ Message-­oriented communication over TCP is common§ e.g., HTTP, memcached, CDNs

§ Linux network stack can serve 1KB messages only at 3.5 Gbps w/ a single core

§ Improve socket API?§ Limited Improvements

§ User-­space TCP/IP stack?§ Maintaining and updating today’s complex TCP is hard

© 2016 NetApp, Inc. All rights reserved. 2

0

1

2

3

4

1 20 40 60 80 100

Thro

ughp

ut [G

b/s]

Concurrent TCP Connections

LinuxSeastarStackMap

StackMap achieves high performance with the OS TCP/IP

Page 3: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Background

§ Message-­oriented communication over TCP (e.g., HTTP, memcached)§ Many concurrent connections§ Small messages§ High packet rates

© 2016 NetApp, Inc. All rights reserved. 3

NICDevice drivers

Linux packet I/O

Socket API

StackMap appRegular app1.

user

kernel

NIC

TCP/IP/Eth

netmap

Packet buffers

4.

2.

3.

Request (e.g., HTTP GET)Response (e.g., HTTP OK)

Page 4: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Message Latency Problem

§ Many requests are processed in each epoll_wait() cycle§ New requests are queued in the kernel

© 2016 NetApp, Inc. All rights reserved. 4

while (1) n = epoll_wait(fds);;for (i = 0;; i < n;; i++) read(fds[i], buf)http_ok(buf);;write(fds[i], buf);;

0

20

40

60

80

100

0 20 40 60 80 100

Des

crip

tors

[#]

Concurrent TCP Connections

# of descriptors returned by epoll_wait()

0

100

200

300

400

500

0 20 40 60 80 100

Late

ncy

[µs]

Concurrent TCP Connections

99th %ile latencymean latency

Page 5: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

§ Processing cost of TCP/IP protocol is not high

§ TCP/IP takes 1.48 us, out of 3.75 us server processing

§ ½ RTT reported by the client app is 9.75 us§ The rest of 6 us come from minimum hard/soft indirection§ netmap-­based ping-­pong (network stack bypass) reports 5.77 us

Where Could We Improve?

© 2016 NetApp, Inc. All rights reserved. 5

0.60

HTTP GET (96B)

HTTP OK (127B)

Pkt. I/O TCP/IP Socket/VFS App

0.72 0.53

0.48

0.220.760.43 (us)

Page 6: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

§ Processing cost of TCP/IP protocol is not high

§ TCP/IP takes 1.48 us, out of 3.75 us server processing

§ ½ RTT reported by the client app is 9.75 us§ The rest of 6 us come from minimum hard/soft indirection (5.77 us)§ netmap-­based ping-­pong (network stack bypass) reports 5.77 us

Where Could We Improve?

© 2016 NetApp, Inc. All rights reserved. 6

0.60

HTTP GET (96B)

HTTP OK (127B)

Pkt. I/O TCP/IP Socket/VFS App

0.72 0.53

0.48

0.220.760.43 (us)

epoll_wait() processing delay

Page 7: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Takeaway

§ Conventional system introduces end-­to-­end latency of 10’s to 100’s of us§ Results of processing delays

§ Socket API comes at a significant cost§ read()/write()/epoll_wait() processing delay

§ Packet I/O is expensive

§ TCP/IP protocol processing is relatively cheap

We can use the feature-­rich kernel TCP/IP implementation, but need to improve API and packet I/O

© 2016 NetApp, Inc. All rights reserved. 7

NICDevice drivers

Linux packet I/O

Socket API

StackMap appRegular app1.

user

kernel

NIC

TCP/IP/Eth

netmap

Packet buffers

4.

2.

3.

Page 8: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

StackMapApproach

§Dedicating a NIC to an application§ Common for today’s high-­performance systems

§ Similar to OS-­bypass TCP/IPs

© 2016 NetApp, Inc. All rights reserved. 8

NICDevice drivers

Linux packet I/O

Socket API

StackMap appRegular app1.

user

kernel

NIC

TCP/IP/Eth

netmap

Packet buffers

4.

2.

3.

NICDevice drivers

Linux packet I/O

Socket API

StackMap appRegular app

user

kernel

NIC

TCP/IP/Eth

Page 9: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

StackMapApproach

§Dedicating a NIC to an application§ Common for today’s high-­performance systems

§ Similar to OS-­bypass TCP/IPs

§ TCP/IP stack in the kernel§ State-­of-­the-­art features§ Active updates and maintenance

© 2016 NetApp, Inc. All rights reserved. 9

NICDevice drivers

Linux packet I/O

Socket API

StackMap appRegular app1.

user

kernel

NIC

TCP/IP/Eth

netmap

Packet buffers

4.

2.

3.

NICDevice drivers

Linux packet I/O

Socket API

StackMap appRegular app

user

kernel

NIC

TCP/IP/Eth

Page 10: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

StackMapArchitecture

© 2016 NetApp, Inc. All rights reserved. 10

NICDevice drivers

Linux packet I/O

Socket API

StackMap appRegular app1.

user

kernel

NIC

TCP/IP/Eth

netmap framework

Packet buffers

4.

2.

3.

1. Socket API for control path§ socket(), bind(), listen()

2. Netmap API for data path (extended)

§ Syscall and packet I/O batching, zero copy, run-­to-­completion

3. Persistent, fixed-­size sk_buffs

§ Efficiently call into kernel TCP/IP

4. Static packet buffers and DMA mapping

Page 11: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

StackMap Data Path API

© 2016 NetApp, Inc. All rights reserved. 11

§ TX§ Put data and fd in each slot§ Advance the head pointer§ Syscall to start network stack processing and transmission headtail

data, fd

Page 12: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

StackMap Data Path API

© 2016 NetApp, Inc. All rights reserved. 12

§ TX§ Put data and fd in each slot§ Advance the head pointer§ Syscall to start network stack processing and transmission

§ RX§ Kernel puts fd on each buffer

§ App can traverse a ring by descriptors

headtail

data, fd

fd4 fd3 fd4 fd4 fd5 fd3

head taildata, fd

fd4fd3fd5

[0]

[2][1]

FD Array nxt 2 5 35idx 0 1 2 3 4

104

[3]

[5][4]

Scratchpad

Page 13: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

§ Implementation§ Linux 4.2 with 228 LoC changes§ netmap with 56 LoC changes§ A new kernel module with 2269 LoC

Experimental Results

§ Setup§ Two machines with Xeon E5-­2680 v2 (2.8 -­3.6 Ghz) Intel 82599 10 GbE NIC

§ Server: Linux or StackMap§ Client: Linux with WRK http benchmark tool or memaslapmemcached benchmark tool

© 2016 NetApp, Inc. All rights reserved. 13

Page 14: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Basic Performance

§ Simple HTTP server§ Serving 1KB messages (single core)

© 2016 NetApp, Inc. All rights reserved. 14

0

2

4

6

8

0 20 40 60 80 100Th

roug

hput

[Gb/

s]Concurrent TCP Connections

LinuxStackMap

0

100

200

300

400

500

0 20 40 60 80 100

Late

ncy

[µs]

Concurrent TCP Connections

Linux (99th %ile)Linux (mean)StackMap (99th %ile)StackMap (mean)

Page 15: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

MemcachedPerformance

§ Serving 1KB messages§ single core§ Seastar is a fast user-­space TCP/IP on on top of DPDK*

§ Serving 64B messages§ 1-­8 CPU cores

© 2016 NetApp, Inc. All rights reserved. 15

0

1

2

3

4

1 20 40 60 80 100Th

roug

hput

[Gb/

s]Concurrent TCP Connections

LinuxSeastarStackMap

0

1

2

3

1 4 8Thro

ughp

ut [G

b/s]

CPU cores [#]

LinuxSeastarStackMap

0

100

200

300

400

1 20 40 60 80 100

Late

ncy

[µs]

Concurrent TCP Connections

LinuxSeastarStackMap

*http://www.seastar-­project.org/

Page 16: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Discussion

§What makes StackMap fast?§ Techniques used by OS-­bypass TCP/IPs§ Run-­to-­completion, static packet buffers, zero copy, syscall and I/O batching and new API

§ Limitations and Future Work§ Safely sharing packet buffers§ If kernel-­owned buffers are modified by a misbehaving app, TCP might fall into inconsistent state

© 2016 NetApp, Inc. All rights reserved. 16

Page 17: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Related Work

§ Kernel-­bypass TCP/IPs§ IX [OSDI’14], Arrakis [OSDI’14], UTCP [CCR’14], Sandstorm [SIGCOMM’14], mTCP [NSDI’14], Seastar

§ Socket API enhancements§ MegaPipe [OSDI’12], FlexSC [OSDI’10], KCM [Linux]

§ Improving OS stack with fast packet I/O§ mSwitch [SOSR’15]

§ In-­stack improvement§ FastSocket [ASPLOS’16]

§ Running kernel stack in user-­space§ Rump [AsiaBSDCon’09], NUSE [netdev’15]

© 2016 NetApp, Inc. All rights reserved. 17

Page 18: StackMap:Low -Latency)Networking with)the)OS)Stack… · Overview! Message-oriented)communication)over)TCP)iscommon! e.g.,HTTP, memcached,CDNs! Linuxnetworkstackcan)serve)1KB)messagesonlyat)3.5)

Conclusion

§ Message-­oriented communication over TCP

§ Kernel TCP/IP is fast§ But socket API and packet I/O are slow

§ We can bring the most of techniques used by kernel-­bypass stacks into the OS stack

§ Latency reduction by 4-­80% (average) or 2-­70% (99th%tile)

§ Throughput improvement by 4-­391%

© 2016 NetApp, Inc. All rights reserved. 18