Post on 07-Jul-2020
Rik van Riel Sr Software Engineer, Red Hat Inc.Thu May 5 2011
KVM PERFORMANCE
OPTIMIZATIONS INTERNALS
KVM performance optimizations
● What is virtualization performance?● Optimizations in RHEL 6.0
● Selected RHEL 6.1 features
● Vhost-net● KSM & Transparent Huge Pages
● Future KVM optimizations
● Pause Loop Exiting● Free Page Hinting & Dynamic Memory Resizing
● Conclusions
What is virtualization performance?
Different users have different goals:
● Throughput or latency on one virtual machine
● Aggregate throughput of a set of virtual machines
● Largest number of virtual machines on a system, with acceptable performance (density).
KVM in RHEL and RHEV has performance optimizations for all of these goals.
RHEL 6.0 KVM optimizations
● Large hardware scalability● 64 VCPU / 1TB memory guests● X2apic● Lockless RCU synchronization
● Transparent Huge Pages (THP)
● Kernel Samepage Merging (KSM)
● Do not use ticket spinlock inside guest
RHEL 6.0 networking (before)
● KVM started as QEMU acceleration● Move perf critical bits to kernel
● QEMU network emulation
● Network packets copied● HW -> host kernel● Host kernel -> qemu● Qemu -> guest
● Copy to/from user overhead
Guest VM
kernel/HV
QEMU
tx
rx
tap
bridge
NIC
vNIC
RHEL 6.1 vhost-net
● Virtio network emulation in kernel● Eliminates userspace copies● Improves latency & throughput● Reduces overhead
● In the future● Macvtap
● Bridge & tap in one● Has some limitations● Great for VEPA or VN-Link
Guest VM
kernel/HV
QEMU
tx
rx
tap
bridge
NIC
vhost
RHEL 6.1 KVM network performance
32 64 128 256 512 1024 204 8 4 092 8192 16384 32768 65507
0
50
100
150
200
250
300
350
4 00
8 Guest Scale Out RX Vhost vs Virtio - % Host CPU
Mbit per % CPU netperf T CP_STREAM
Vho st Virt io
Message Size (Byt es)
Mb
it /
% C
PU
(b
igg
er
is b
ett
er)
Kernel Samepage Merging (KSM)
● Share identical pages● Libraries● Executables● Images● Free pages (Windows)
● 4kB pages only
● Ksmd scans memory● CPU / space tradeoff
G1 G2
Kernel
KSM & Transparent Huge Pages
● KSM in 6.0 only scans 4kB pages● Transparent Huge Pages (THP) memory excluded● THP memory broken into small pages at swap time
● Too late
● KSM in 6.1 “looks inside” huge pages● If shareable 4kB page found● ... break up huge page into small pages● ... and share those● Only done when free memory gets low
KSM tuning
● Automatically done by ksmtuned
● Ksmd started when free memory low● <10% or 2GB free memory● Configurable in /etc/ksmtuned.conf
● KSM uses CPU time, to save memory● Good for packing guests onto systems (density)● Can help avoid or reduce swap IO
● If CPU is bottleneck and memory is plentyful● ... disable KSM
Pause Loop Exiting – guest spinlocks
● VCPU 2 holds a spinlock
● VCPU 1 wants the spinlock, spins for it
● After VCPU 2 releases the lock, VCPU 1 grabs it
VCPU 1 VCPU 2
Pause Loop Exiting – the problem
● But what if VCPU 2 is not running?● Page fault, host scheduler, KVM/qemu lock, ...
● VCPU 1 could spin for a long time● Waste lots of CPU time● Could keep VCPU 2 from running!
VCPU 1 VCPU 2
Pause Loop Exiting – a hardware solution
● Hardware detection of a spinning VCPU● Intel – Pause Loop Exiting (PLE)● AMD – Pause Filtering (PF)
● When a VCPU spins for “too long”● Detect it in hardware and trap to the host● Host “does something”...● Problem solved!
Pause Loop Exiting – directed yield
● Host side solution is two-fold● Waste little time spinning● Get lock holder to run ASAP, it may release the lock
● Spinner and lock holder could run on different CPUs● Implement CFS yield_to() to temporarily boost lock
holder's priority
● Guest with lock contention should not lose CPU time● Reimplement CFS yield() to temporarily give up CPU
time, without losing priority
PLE benchmark results
● AMQP benchmark with 8 and 16 byte messages
● 4 VCPU guest on 4 physical CPUs
● No, 2 or 4 VCPU load for no, 1.5x and 2x overcommit
● 10-25% speedup at high CPU overcommit ratios
no 1.5x 2.0x
0
50000
100000
150000
200000
250000
AMQP 8 byte
no PLEPLE
no 1.5x 2.0x
0
50000
100000
150000
200000
250000
AMQP 16 byte
no PLEPLE
Free Page Hinting
● Nested LRU problem● Sum of guest memory exceeds host memory● Host swaps out least used pages● Guest frees least used pages● Guest re-uses least used pages first● Combined guest & host swapping can lead to terrible
performance
Nested LRU problem illustrated
● Three guests
● Host is swapping
● Swap IO due to free memory
● Guests re-use free pages
for new content
● Free memory contains no
useful data
● Swap IO unnecessary
HOST MEMORYHOST MEMORY
SWAPSWAP
USED FREE
GUEST MEMORY
Free page hinting
● Free pages contain no useful information● The host can throw away free guest pages!
● Swapout, KSM, etc
● Give the guest a fresh page on use● Skip disk IO on swapin
● Guest needs to inform host what memory is free● Can use a big bitmap● Be careful at state transitions (free->used)
Nested LRU with page hinting
● Most free pages eliminated
● Guests now fit in RAM● The used bits...
● Swap IO greatly reduced
● But what if sum of used
memory exceeds RAM size?
HOST MEMORYHOST MEMORY
SWAPSWAP
USED FREE
GUEST MEMORY
Dynamic memory resizing
● We can do more than just eliminate free memory● Guest has working set, page cache and free
● When the host is near swapout● It can ask each guest to free some memory● Do not shrink guest below minimum size
● When a guest is near swapout● It can ask the host for some memory back● Do not grow guest above maximum size
Dynamic memory resizing
● Ask guests to free extra
memory when host needs
● Obey min/max guest size
● Fit more guests per host
● Combine with cgroups for
guest prioritization
● Avoid most swap IO
HOST MEMORYHOST MEMORY
SWAPSWAP
USED FREE
GUEST MEMORY
CACHE
Conclusions
● KVM on RHEL/RHEV the top virt performer● 3 of the top 5 SPECvirt scores, including #1 (RHEL 6.0)● RHEL 6.1 even faster, more optimizations ahead
● Keep KVM the best performing virt platform● For single guests, aggregate throughput and density
● Use new hardware features like SR-IOV and PLE
● Smart resource management● More predictable performance, with more guests● Cheaper virtualization hosting
More information
● KVM project
http://www.linux-kvm.org/
● Red Hat kbase, tech briefs, etc
https://access.redhat.com/knowledge/
● Linux Weekly News● http://lwn.net/