Personal Pervasive Telemetry
Transcript of Personal Pervasive Telemetry
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Achieving 10 Gb/s Using Xen Para-virtualized Network
Drivers
Kaushik Kumar Ram*, J. Renato Santos+, Yoshio Turner+, Alan L. Cox*, Scott Rixner*
+HP Labs *Rice University
Xen summit – Feb 200902/25/2009 2
Xen PV Driver on 10 Gig Networks
• Focus of this talk: RX
0
1
2
3
4
5
6
7
8
9
10
RX TX TX sendfile
Ra
te (
Gb
/s)
Xen Linux
Throughput on a single TCP connection (netperf)
Xen summit – Feb 200902/25/2009 3
Network Packet Reception in Xen
Driver Domain Guest Domain
Backend
Driver
Frontend
Driver
Xen
Physical Driver
Hardware
NIC
I/O Channel
Incoming Pkt
IRQ
Bridgegrant copy
eventDMA
demux
23
4
5
6
7Push into the network stack
1 Post grant on I/O channel
grMechanisms to
reduce driver domain cost:
• Use of Multi-queue NIC− Avoid data copy− Packet demultiplex in
hardware
• Grant Reuse Mechanism− Reduce cost of grant
operations
Xen summit – Feb 200902/25/2009 4
Using Multi-Queue NICs
Driver Domain
Guest Domain
Backend
Driver
Frontend
Driver
Xen
Physical Driver
Hardware
MQ NIC
I/O Channel
s
Incoming Pkt
IRQevent6 8
Post grant on I/O channel
1
Map buffer
post buf on dev queue
DMA5
UnMap buffer
7
9Push into the network stack
gr
3
2
• Advantage of multi-queue• Avoid data copy• Avoid software
bridge
One RX queueper guest
guestMAC addr
demux4
Xen summit – Feb 200902/25/2009 5
Performance Impact of Multi-queue
• Savings due to multiqueue• grant copy
• bridge
• Most of remaining cost• grant hypercalls
(grant + xen functions)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Current Xen Multi-queue
Cyc
les/
Pack
et
Xen
Xen grant
Linux other
User copy
Linux grant
mm
mem*
bridge
network
netback
driver
Driver Domain CPU Cost
Frontend
Driver
Xen summit – Feb 200902/25/2009 6
Using Grants with Multi-queue NIC
Driver Domain Guest Domain
Backend
Driver
Xen
Physical Driver
NIC
1map grant hypercall
gr
3unmap grant hypercall
• Multi-queue replaces one grant hypercall (copy) with two hypercalls (map/unmap)
• Grant hypercalls are expensive• Map/unmap calls for
every I/O operation
use page for I/O2
Xen summit – Feb 200902/25/2009 7
Reducing Grant Cost
•Grant Reuse−Do not revoke grant after I/O is completed
−Keep buffer page on a pool of unused I/O pages
−Reuse already granted pages available on buffer pool for future I/O operations
−Avoids map/unmap on every I/O
Xen summit – Feb 200902/25/2009 8
Revoking a Grant for when the Page is Mapped in Driver Domain
• Guest may need to reclaim I/O page for other use (e.g. memory pressure on guest)
• Need to unmap page at driver domain before using it in guest kernel• To preserve memory isolation (e.g. protect from
driver bugs)
• Need handshake between frontend and backend to revoke grant• This may be slow especially if the driver domain is
not running
Xen summit – Feb 200902/25/2009 9
Approach to AvoidHandshake when Revoking Grants
• Observation: No need to map guest page into driver domain with multi-queue NIC
• Software does not need to look at packet header, since demux is performed in the device
• Just need page address for DMA operation
• Approach: Replace grant map hypercall with a shared memory interface to the hypervisor
• Shared memory table provides translation of guest grant to page address
• No need to unmap page when guest needs to revoke grant (no handshake)
Xen summit – Feb 200902/25/2009 10
Software I/O Translation Table
Driver Domain Guest Domain
Backend
Driver
Frontend
Driver
Xen
Physical Driver
NIC
create a grant for buffer page
Send grant over I/O channel
2set hypercall
Validate, pin and update
SIOTT
9clear hypercall
6get page
8reset use
1
3
SIOTT
#pg
usepg
01
Use page for I/O7
DMA
event
10check use and revoke
gr
4
5set use
pg
• SIOTT: software I/O translation table−Indexed by grant
reference−“pg” field: guest page
address & permission−“use” field indicates if
grant is in use by driver domain
•set/clear hypercalls−Invoked by guest−Set validates grant,
pins page, and writes page address to SIOTT
−Clear requires that “use”=0
Xen summit – Feb 200902/25/2009 11
Grant Reuse:Avoid pin/unpin hypercall on every I/O
Driver Domain Guest Domain
Backend
Driver
Frontend
Driver
Xen
Physical Driver
NIC
create grant
2set hypercall
validate, pin and update SIOTT
1
3
SIOTT
#pg
usepg
0
event
I/O Buffer Pool
reuse buffer & grant from pool
5
return buffer to pool & keep grant4
kernel mem pressure
clearhypercall
8
9clear SIOT
return page to kernel
7
11
grgr
6 return buffer to pool & keep grant
10 revoke grant
Use page for I/O
Xen summit – Feb 200902/25/2009 12
Performance Impact of Grant Reuse w/ Software I/O Translation Table
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Current Xen Multi-queue SIOTT w/ grantreuse
Cy
cle
/ P
ac
ke
t
Xen
Xen grant
Linux other
User copy
Linux grant
mm
mem*
bridge
network
netback
driver
cost saving: grant hypercall
Driver Domain CPU Cost
Xen summit – Feb 200902/25/2009 13
Impact of optimizations on throughput
Data rate CPU utilization
0
1
2
3
4
5
6
7
8
9
10
current Xen multi-queue w/grant reuse
Linux
Rat
e (G
b/s)
• Multi-queue w/ grant reuse significantly reduce driver domain cost
• Bottleneck shifts from driver domain to guest• Higher cost in guest than in Linux still limits throughput in
Xen
0
20
40
60
80
100
120
current Xen multi-queue w/grant reuse
Linux
CP
U u
tiliz
atio
n (%
)
driver domain guest linux
Xen summit – Feb 200902/25/2009 14
Additional optimizations at guest frontend driver
• LRO (Large Receive Offload) support at frontend−Consecutive packets on same connection
combined into one large packet
−Reduces cost of processing packet in network stack
• Software prefetch−Prefetch next packet and socket buffer struct
into CPU cache while processing current packet
−Reduces cache misses at frontend
• Avoid full page buffers−Use half-page (2KB) buffers (Max pkt size is
1500 bytes)−Reduces TLB working set and thus TLB misses
Xen summit – Feb 200902/25/2009 15
Performance impact of guest frontend optimizations
• Optimizations bring CPU cost in guest close to native Linux
• Remaining cost difference−Higher cost in netfront than in physical driver
−Xen functions to send and deliver events
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
multiqueue w/grant reuse
frontend w/LRO
frontend w/prefetch
reducedbuffer size
(2KB)
Linux
cycl
es/p
acke
tXen
Xen grant
Linux other
User copy
Linux-grant
mm
mem
network
driver
Guest Domain CPU Cost
0
1
2
3
4
5
6
7
8
9
10
Rat
e (g
b/s
)
Xen summit – Feb 200902/25/2009 16
Impact of all optimizations on throughput
• Multiqueue with software optimizations achieves the same throughput as direct I/O ( ~8 Gb/s)
• 2 or more guests are able to saturate 10 gigabit link
current PV driver
optimized PV driver (1 guest)
optimized PV driver (2 guests)
Direct I/O (1 guest)
Linux
Xen summit – Feb 200902/25/2009 17
Conclusion
• Use of multi-queue support in modern NICs enables high performance networking with Xen PV Drivers−Attractive alternative to Direct I/O
• Same throughput, although with some additional CPU cycles at driver domain
• Avoids hardware dependence in the guests
−Light driver domain enables scalability for multiple guests• Driver domain can now handle 10 Gb/s data rates
• Multiple guests can leverage multiple CPU cores and saturate10 gigabit link
Xen summit – Feb 200902/25/2009 18
Status
• Status−Performance results obtained on a modified
netfront/netback implementation using the original Netchannel1 protocol
−Currently porting mechanisms to Netchannel2• Basic multi-queue already available on public
netchannel2 tree
• Additional software optimizations still in discussion with community and should be included in netchannel2 sometime soon.
• Thanks to−Mitch Williams and John Ronciak from Intel for providing
samples of Intel NICs and for adding multi-queue support on their driver
−Ian Pratt, Steven Smith and Keir Fraser for helpful discussions