vSphere 6.5 Host Resources Deep Dive
Compute & Network
D e n n e m a n & H a g o o r t
# N LV M U G
Introduction
www.c loudf i x .n l
Niels Hagoort• F ree lance Arch i tect
• VMware VCDX #212
• VMware vExper t (NSX)
Frank Denneman• Sen io r S taf f Arch i tect
• VMware VCDX #29
• VMware vExper t
www.f rankdenneman.n l
COMPUTE
N U M A – N U M A - N U M A
Insights In Virtual Data Centers
Modern Servers are
Non-Uni form Memory Access (NUMA)
Systems
Local and Remote Memory Access
NUMA Focus Points
• NUMA Conf igurat ion
• DIMM Types
• Size VM Match CPU Topology
DIMM Per Channel
Regions For Interleaving Memory
3 DIMMs Per Channel Performance
Memory Ranking
• DRAM Chips are Grouped in Ranks
• Ranks Create Elect r ica l Load
• Max 8 Ranks per Channel
• LRDIMMs Abstracts Ranks Using RCD Buffer
DIMM Type Performance Impact
RDIMM Performance
LRDIMM Performance
Performance Costs!
CONSISTENCY IS KEY!
P e r f o r m a n c e S e c o n d
Uneven NUMA Configurations
Uneven NUMA Configurations
Symmetric NUMA Configurations
Asymmetric NUMA Nodes
Unbalanced Channel Configuration
Heterogeneous DIMM Configuration
Today’s Sweet Spot!
Right Size your VM
A l i g n m e n t r e s u l t s i n m o r e c o n s i s t e n t p e r f o r m a n c e
ESXi CPU Schedulers
• CPU Scheduler Al locates Core or HT Cycles
• NUMA Scheduler In i t ia l Placement + LB
• VM vCPU Conf igurat ion Impacts IP & LB
CPU + NUMA Scheduling Constructs
Memory Footprint
NUMA Scheduling Constructs
NUMA Scheduling Constructs
ESXi Pre-6.5 Cores Per Socket
Defragmented Memory Address Space
Defragmented Memory Address Space
ESXi 6.5 Cores Per Socket
ESXi 6.5 Cores Per Socket
ESXi 6.5 Cores Per Socket
Unifying the Cache
Align Cores Per Socket with Physical Core Count CPU Package
Screenshot Cores per Socket
Unifying the Cache
Optimizing VM footprint
• VM Conf ig does not exceed NUMA Node Conf ig
• Use Cores Per Socket Wisely
NETWORK
V X L A N – N E T P O L L – T x T H R E A D S
pNIC VXLAN performance considerations
• Addi t ional layer of packet processing
• Consumes CPU cyc les for each packet for
encapsulat ion/de-capsulat ion
• Some of the off load capabi l i t ies of the NIC cannot
be used (TCP based)
• VXLAN off loading! (TSO / CSO)
VXLAN
1.
2.
3.
esxcl i network n ic get -n vmnicX
esxcli system module parameters list –m bnx2x
RSS & NetQueue
• NIC suppor t requ i red (RSS / VMDq)
• VMDq is the hardware fea tu re , Ne tQueue is the fea tu re baked in to vSphere
• RSS & NetQueue s im i la r in bas ic func t iona l i t y
• RSS uses hashes based on IP /TCP por t /MAC
• NetQueue uses MAC f i l t e rs
Without RSS for VXLAN
RSS enabled (>1 threads per pNIC)
Receive network I/O with VXLAN
“What is the maximum performance of the vSphere (D)vSwitch?”
Network I/O CPU consumption
• By defaul t one t ransmit (Tx) thread per VM
• By defaul t , one receive (Netpol l ) thread per pNIC
• Transmit (Tx) and receive (Netpol l ) threads consume CPU cyc les
Netpoll Thread
Netpoll Thread Scaling
vsish
/> cat /wor ld /66076/name
“vmnic1 -pol lWor ldnetpol l [00]”
net-s tats -A - t vW
Tx Thread
• VMXNET3 is requi red!
• example for vNIC0:
ethernet0 .c txPerDev = "1 “
Additional Tx Thread
Additional Tx thread
Additional Tx thread
/> ca t /wor ld /194786/name
NetWor ld -VM-194786
/> ca t /wor ld /242681/name
NetWor ld -Dev-67108879-Tx
• Transmi t (Tx) and rece ive ( Netpo l l ) th reads can be sca led !
• Take the ex t ra CPU cyc les fo r ne twork I /O in to accoun t !
Summary
Shameless Plug (May 2017)
Thanks!
@FrankDenneman
@NHagoort
@HostDeepDive
Top Related