Download - vSphere 6.5 Host Resources Deep Dive Compute & Network · RSS & NetQueue •NIC support required (RSS / VMDq) •VMDq is the hardware feature, NetQueue is the feature baked into vSphere

vSphere 6.5 Host Resources Deep Dive

Compute & Network

D e n n e m a n & H a g o o r t

# N LV M U G

Introduction

www.c loudf i x .n l

Niels Hagoort• F ree lance Arch i tect

• VMware VCDX #212

• VMware vExper t (NSX)

Frank Denneman• Sen io r S taf f Arch i tect

• VMware VCDX #29

• VMware vExper t

www.f rankdenneman.n l

COMPUTE

N U M A – N U M A - N U M A

Insights In Virtual Data Centers

Modern Servers are

Non-Uni form Memory Access (NUMA)

Systems

Local and Remote Memory Access

NUMA Focus Points

• NUMA Conf igurat ion

• DIMM Types

• Size VM Match CPU Topology

DIMM Per Channel

Regions For Interleaving Memory

3 DIMMs Per Channel Performance

Memory Ranking

• DRAM Chips are Grouped in Ranks

• Ranks Create Elect r ica l Load

• Max 8 Ranks per Channel

• LRDIMMs Abstracts Ranks Using RCD Buffer

DIMM Type Performance Impact

RDIMM Performance

LRDIMM Performance

Performance Costs!

CONSISTENCY IS KEY!

P e r f o r m a n c e S e c o n d

Uneven NUMA Configurations

Symmetric NUMA Configurations

Asymmetric NUMA Nodes

Unbalanced Channel Configuration

Heterogeneous DIMM Configuration

Today’s Sweet Spot!

Right Size your VM

A l i g n m e n t r e s u l t s i n m o r e c o n s i s t e n t p e r f o r m a n c e

ESXi CPU Schedulers

• CPU Scheduler Al locates Core or HT Cycles

• NUMA Scheduler In i t ia l Placement + LB

• VM vCPU Conf igurat ion Impacts IP & LB

CPU + NUMA Scheduling Constructs

Memory Footprint

NUMA Scheduling Constructs

ESXi Pre-6.5 Cores Per Socket

Defragmented Memory Address Space

ESXi 6.5 Cores Per Socket

Unifying the Cache

Align Cores Per Socket with Physical Core Count CPU Package

Screenshot Cores per Socket

Unifying the Cache

Optimizing VM footprint

• VM Conf ig does not exceed NUMA Node Conf ig

• Use Cores Per Socket Wisely

NETWORK

V X L A N – N E T P O L L – T x T H R E A D S

pNIC VXLAN performance considerations

• Addi t ional layer of packet processing

• Consumes CPU cyc les for each packet for

encapsulat ion/de-capsulat ion

• Some of the off load capabi l i t ies of the NIC cannot

be used (TCP based)

• VXLAN off loading! (TSO / CSO)

VXLAN

1.

2.

3.

esxcl i network n ic get -n vmnicX

esxcli system module parameters list –m bnx2x

RSS & NetQueue

• NIC suppor t requ i red (RSS / VMDq)

• VMDq is the hardware fea tu re , Ne tQueue is the fea tu re baked in to vSphere

• RSS & NetQueue s im i la r in bas ic func t iona l i t y

• RSS uses hashes based on IP /TCP por t /MAC

• NetQueue uses MAC f i l t e rs

Without RSS for VXLAN

RSS enabled (>1 threads per pNIC)

Receive network I/O with VXLAN

“What is the maximum performance of the vSphere (D)vSwitch?”

Network I/O CPU consumption

• By defaul t one t ransmit (Tx) thread per VM

• By defaul t , one receive (Netpol l ) thread per pNIC

• Transmit (Tx) and receive (Netpol l ) threads consume CPU cyc les

Netpoll Thread

Netpoll Thread Scaling

vsish

/> cat /wor ld /66076/name

“vmnic1 -pol lWor ldnetpol l [00]”

net-s tats -A - t vW

Tx Thread

• VMXNET3 is requi red!

• example for vNIC0:

ethernet0 .c txPerDev = "1 “

Additional Tx Thread

Additional Tx thread

Additional Tx thread

/> ca t /wor ld /194786/name

NetWor ld -VM-194786

/> ca t /wor ld /242681/name

NetWor ld -Dev-67108879-Tx

• Transmi t (Tx) and rece ive ( Netpo l l ) th reads can be sca led !

• Take the ex t ra CPU cyc les fo r ne twork I /O in to accoun t !

Summary

Shameless Plug (May 2017)

Thanks!

@FrankDenneman

@NHagoort

@HostDeepDive