Personal Pervasive Telemetry

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Achieving 10 Gb/s Using Xen Para-virtualized Network

Drivers

Kaushik Kumar Ram*, J. Renato Santos+, Yoshio Turner+, Alan L. Cox*, Scott Rixner*

+HP Labs *Rice University

Xen summit – Feb 200902/25/2009 2

Xen PV Driver on 10 Gig Networks

• Focus of this talk: RX

0

1

2

3

4

5

6

7

8

9

10

RX TX TX sendfile

Ra

te (

Gb

/s)

Xen Linux

Throughput on a single TCP connection (netperf)

Xen summit – Feb 200902/25/2009 3

Network Packet Reception in Xen

Driver Domain Guest Domain

Backend

Driver

Frontend

Driver

Xen

Physical Driver

Hardware

NIC

I/O Channel

Incoming Pkt

IRQ

Bridgegrant copy

eventDMA

demux

23

4

5

6

7Push into the network stack

1 Post grant on I/O channel

grMechanisms to

reduce driver domain cost:

• Use of Multi-queue NIC− Avoid data copy− Packet demultiplex in

hardware

• Grant Reuse Mechanism− Reduce cost of grant

operations

Xen summit – Feb 200902/25/2009 4

Using Multi-Queue NICs

Driver Domain

Guest Domain

Backend

Driver

Frontend

Driver

Xen

Physical Driver

Hardware

MQ NIC

I/O Channel

s

Incoming Pkt

IRQevent6 8

Post grant on I/O channel

1

Map buffer

post buf on dev queue

DMA5

UnMap buffer

7

9Push into the network stack

gr

3

2

• Advantage of multi-queue• Avoid data copy• Avoid software

bridge

One RX queueper guest

guestMAC addr

demux4

Xen summit – Feb 200902/25/2009 5

Performance Impact of Multi-queue

• Savings due to multiqueue• grant copy

• bridge

• Most of remaining cost• grant hypercalls

(grant + xen functions)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Current Xen Multi-queue

Cyc

les/

Pack

et

Xen

Xen grant

Linux other

User copy

Linux grant

mm

mem*

bridge

network

netback

driver

Driver Domain CPU Cost

Frontend

Driver

Xen summit – Feb 200902/25/2009 6

Using Grants with Multi-queue NIC


Backend

Driver

Xen

Physical Driver

NIC

1map grant hypercall

gr

3unmap grant hypercall

• Multi-queue replaces one grant hypercall (copy) with two hypercalls (map/unmap)

• Grant hypercalls are expensive• Map/unmap calls for

every I/O operation

use page for I/O2

Xen summit – Feb 200902/25/2009 7

Reducing Grant Cost

•Grant Reuse−Do not revoke grant after I/O is completed

−Keep buffer page on a pool of unused I/O pages

−Reuse already granted pages available on buffer pool for future I/O operations

−Avoids map/unmap on every I/O

Xen summit – Feb 200902/25/2009 8

Revoking a Grant for when the Page is Mapped in Driver Domain

• Guest may need to reclaim I/O page for other use (e.g. memory pressure on guest)

• Need to unmap page at driver domain before using it in guest kernel• To preserve memory isolation (e.g. protect from

driver bugs)

• Need handshake between frontend and backend to revoke grant• This may be slow especially if the driver domain is

not running

Xen summit – Feb 200902/25/2009 9

Approach to AvoidHandshake when Revoking Grants

• Observation: No need to map guest page into driver domain with multi-queue NIC

• Software does not need to look at packet header, since demux is performed in the device

• Just need page address for DMA operation

• Approach: Replace grant map hypercall with a shared memory interface to the hypervisor

• Shared memory table provides translation of guest grant to page address

• No need to unmap page when guest needs to revoke grant (no handshake)

Xen summit – Feb 200902/25/2009 10

Software I/O Translation Table


Backend

Driver

Frontend

Driver

Xen

Physical Driver

NIC

create a grant for buffer page

Send grant over I/O channel

2set hypercall

Validate, pin and update

SIOTT

9clear hypercall

6get page

8reset use

1

3

SIOTT

#pg

usepg

01

Use page for I/O7

DMA

event

10check use and revoke

gr

4

5set use

pg

• SIOTT: software I/O translation table−Indexed by grant

reference−“pg” field: guest page

address & permission−“use” field indicates if

grant is in use by driver domain

•set/clear hypercalls−Invoked by guest−Set validates grant,

pins page, and writes page address to SIOTT

−Clear requires that “use”=0

Xen summit – Feb 200902/25/2009 11

Grant Reuse:Avoid pin/unpin hypercall on every I/O


Backend

Driver

Frontend

Driver

Xen

Physical Driver

NIC

create grant

2set hypercall

validate, pin and update SIOTT

1

3

SIOTT

#pg

usepg

0

event

I/O Buffer Pool

reuse buffer & grant from pool

5

return buffer to pool & keep grant4

kernel mem pressure

clearhypercall

8

9clear SIOT

return page to kernel

7

11

grgr

6 return buffer to pool & keep grant

10 revoke grant

Use page for I/O

Xen summit – Feb 200902/25/2009 12

Performance Impact of Grant Reuse w/ Software I/O Translation Table

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Current Xen Multi-queue SIOTT w/ grantreuse

Cy

cle

/ P

ac

ke

t

Xen

Xen grant

Linux other

User copy

Linux grant

mm

mem*

bridge

network

netback

driver

cost saving: grant hypercall

Driver Domain CPU Cost

Xen summit – Feb 200902/25/2009 13

Impact of optimizations on throughput

Data rate CPU utilization

0

1

2

3

4

5

6

7

8

9

10

current Xen multi-queue w/grant reuse

Linux

Rat

e (G

b/s)

• Multi-queue w/ grant reuse significantly reduce driver domain cost

• Bottleneck shifts from driver domain to guest• Higher cost in guest than in Linux still limits throughput in

Xen

0

20

40

60

80

100

120

current Xen multi-queue w/grant reuse

Linux

CP

U u

tiliz

atio

n (%

)

driver domain guest linux

Xen summit – Feb 200902/25/2009 14

Additional optimizations at guest frontend driver

• LRO (Large Receive Offload) support at frontend−Consecutive packets on same connection

combined into one large packet

−Reduces cost of processing packet in network stack

• Software prefetch−Prefetch next packet and socket buffer struct

into CPU cache while processing current packet

−Reduces cache misses at frontend

• Avoid full page buffers−Use half-page (2KB) buffers (Max pkt size is

1500 bytes)−Reduces TLB working set and thus TLB misses

Xen summit – Feb 200902/25/2009 15

Performance impact of guest frontend optimizations

• Optimizations bring CPU cost in guest close to native Linux

• Remaining cost difference−Higher cost in netfront than in physical driver

−Xen functions to send and deliver events

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

multiqueue w/grant reuse

frontend w/LRO

frontend w/prefetch

reducedbuffer size

(2KB)

Linux

cycl

es/p

acke

tXen

Xen grant

Linux other

User copy

Linux-grant

mm

mem

network

driver

Guest Domain CPU Cost

0

1

2

3

4

5

6

7

8

9

10

Rat

e (g

b/s

)

Xen summit – Feb 200902/25/2009 16

Impact of all optimizations on throughput

• Multiqueue with software optimizations achieves the same throughput as direct I/O ( ~8 Gb/s)

• 2 or more guests are able to saturate 10 gigabit link

current PV driver

optimized PV driver (1 guest)

optimized PV driver (2 guests)

Direct I/O (1 guest)

Linux

Xen summit – Feb 200902/25/2009 17

Conclusion

• Use of multi-queue support in modern NICs enables high performance networking with Xen PV Drivers−Attractive alternative to Direct I/O

• Same throughput, although with some additional CPU cycles at driver domain

• Avoids hardware dependence in the guests

−Light driver domain enables scalability for multiple guests• Driver domain can now handle 10 Gb/s data rates

• Multiple guests can leverage multiple CPU cores and saturate10 gigabit link

Xen summit – Feb 200902/25/2009 18

Status

• Status−Performance results obtained on a modified

netfront/netback implementation using the original Netchannel1 protocol

−Currently porting mechanisms to Netchannel2• Basic multi-queue already available on public

netchannel2 tree

• Additional software optimizations still in discussion with community and should be included in netchannel2 sometime soon.

• Thanks to−Mitch Williams and John Ronciak from Intel for providing

samples of Intel NICs and for adding multi-queue support on their driver

−Ian Pratt, Steven Smith and Keir Fraser for helpful discussions

Personal Pervasive Telemetry

Documents

Transcript of Personal Pervasive Telemetry