Performance Troubleshooting
description
Transcript of Performance Troubleshooting
Performance TroubleshootingValentin Bondzio, Research Engineer
2
Approaching Performance Issues Esxtop introduction (Troubleshooting Examples) References
Agenda
3
Perception “XY Problem” Comparison Benchmark / Dependencies Tools
Approaching Performance Issues
4
Perception
Subjective and generic, not quantified:• “A user reported that his application is slow.“
• "Our VM used to be much faster.“
Somewhat quantified, only symptoms:• "CPU usage during a network file transfer is 90%.“
• "The VM seems to hang whenever I start a program.“
Symptoms might be miles away from the root cause, e.g.:• A VM has a noticeable time-skew and lags behind
• Root cause: Antivirus scans at the same time -> stress on the storage
5
“XY Problem”
Wrong conclusions about the issue lead to the wrong questions• An example:
<problem X> Alice’s car won’t start
<problem Y> She asks Bob to help her replace the battery
…
<problem Y> The car still does not start
<problem X> The real issue is no gas in the tank
Keeping an open mind will reduce the time wasted• Approach the issue from all sides and don’t rush to conclusions
• Take note of all the symptoms and the state of the environment
6
Comparisons
Common comparisons are:• The old system vs. the new one
• A physical vs. a virtual system
This usually means different settings or underlying hardware• Example:
• The CPUs in the old box might be 2 generations behind, but it has twice as many
• The underlying RAID layout in the new system is different
Do not compare apples with oranges• Make sure the workload / benchmark is consistent and repeatable
• Keep the configuration as equal as possible, example:• A 4 core physical system should be compared to a 4 vCPU VM that is not contended
7
Benchmarks / Dependencies
Is the benchmark reproducible?• Do not use the live system where e.g. the amount of users might vary
Be aware that most benchmarks stress multiple components, e.g.:• IO tests from within the VM will also stress the CPU
• A file copy over the network could also be affected by the storage speed (r/w)
The goal is to find the bottleneck • identify the workload pattern of the production
• benchmark components (CPU, Memory, Network, Disk) on their own
8
Tools
Performance Charts in vCenter Server• use it to check for patterns across multiple VMs / Hosts / Datastores
• compare current loads to ones in the past
esxtop• our “goto” tool, enough granularity for 99% of issues
vscsiStats• Identify IO pattern
In Guest Tools• Iometer / Iozone
• Process Explorer / atop
9
Navigation CPU Memory
Esxtop introduction
10
Navigation (“V”)
‘V’ show VM’s only
11
Navigation (Views and Fields)
Esxtop Views• c:cpu, i:interrupt, m:memory, n:network, d:disk adapter,
u:disk device, v:disk VM, p:power mgmt
‘f’ Fields
‘h’ Help
12
CPU (USED / UTIL)
PCPU USED (%)• “effective work”, non-halted cycles in reference to the nominal frequency
PCPU UTIL (%)• non-halted cycles in reference to the elapsed time with current frequency
CORE UTIL (%)• only displayed when Hyper-Threading is enabled
100 % USED50 % USED2.6 GHz1.3 GHz50 % UTIL100 % UTIL
25 % USED
13
CPU (USED / UTIL)
Why is PCPU USED (%) different from PCPU UTIL (%)?• Frequency scaling
• Downscaling (due to power management, e.g. Intel SpeedStep)• ‘p’ Power Management View
• Upscaling (due to dynamic overclocking, e.g. Intel Turbo Boost)
• Hyper-Threading, ESXi 5.0 charges 62.5% per logical CPU (concurrent use)
'+' means busy, '-' means idle.(1) PCPU 0: +++++----- (UTIL: %50 / USED: %50) PCPU 1: -----+++++ (UTIL: %50 / USED: %50)(2) PCPU 0: +++++----- (UTIL: %50 / USED: %31.25) PCPU 1: +++++----- (UTIL: %50 / USED: %31.25)(3) PCPU 0: +++++----- (UTIL: %50 / USED: %42.5, i.e. %30 + %20/1.6) PCPU 1: ---+++++-- (UTIL: %50 / USED: %42.5, i.e. %20/1.6 + %30)
14
CPU (general per VM counters)
%USED• amount of CPU usage that is accounted for this world / VM
%RUN• percentage of total scheduled runtime
%RDY (Ready Time)• percentage of time the VM was ready to run but not scheduled
%MLMTD (Max Limited)• percentage of time not scheduled due to a CPU Limit (part of %RDY)
%SWPWT (Swap Wait)• amount of time the VM was not scheduled due to a memory swap in from disk
15
CPU related (limits)
Two VMs on the same vSwitch, VM1 responds slow to requests• In this example, represented by Ping*
VM1 is busy • high %RDY time indicates that the VM is contended for CPU resources
• %RDY = %MLMTD means all of the ready time is caused by a CPU limit*Ping is not a performance benchmark! In this case just an easy replacement and visualisation for server requests.
16
CPU related (limits)
Check that there is a CPU limit with the ‘h’ field (CPU ALLOC)
AMAX indicates a 2000 MHz limit (1000 MHz per vCPU) • removing the limit will normalize the responsiveness of VM1
17
CPU related (fairness)
Performance not as good as expected on Xeon 5500 and later• Intel Hyper-Threading is enabled
• Performance degradation especially noticeable if:• CPU utilization of the host is higher 50%
• the workload has a particular kind of bursty CPU usage pattern
The fairness scheduling algorithm works different with HT enabled
• VMs that lag behind in “vtime” are given a full core for each vCPU to catch up• Equal to setting the Hyperthreaded Core Sharing Mode for that VM to “None”
Enable HTCore(no HT)
HT2
HT1+
18
CPU related (fairness)
Fairness is important to honor shares, reservations and limits• Defaults are performing good in most scenarios
• Some workloads benefit from a higher “fairness threshold”
“HaltingIdleMsecPenalty” and “HaltingIdleMsecPenaltyMax” • Controls how far behind a VM can fall before it will be given a full core
• “HIMP” is per vCPU / “HIMPmax” is per VM • Not much performance benefit with more than HIMP = 2000 / HIMPmax = 16000
• Always remember to also increase HIMPmax, since the default is 800
Setting is deprecated in ESXi 5.0• Scheduler is enhanced to maximize throughput and fairness with HT
• Upgrade to 5.0 not yet an option? Benchmark your systems with higher “HIMP”
KB: HaltingIdleMsecPenalty Parameter: Guidance for Modifying vSphere's Fairness/Throughput Balance
19
CPU related (power capping)
The Guest seems to use a lot of CPU
Frequency Scaling
• In most cases controlled by the BIOS power options
20
CPU Related (power capping)
Fujitsu
Consult your vendor!
HP
IBM
21
CPU related (power capping)
Check ESX host power policy
Contact your hardware support• BIOS or hardware issues could lead to frequency downscaling
22
Memory (Memory reclamation counters)
t MCTLSZ (MemCtl Size) / MCTLTGT (MemCtl Target)• currently reclaimed memory via ballooning / balloon reclamation goal
• > 0 target means active memory pressure
SWCUR (Swapped Currently) / SWTGT (Swap Target)• amount of Guest memory that is swapped to disk / target swap size
• SWCUR is not actively reduced, the Guest must touch the pages
SWR/s (Swap Read) / SWW/s (Swap Write)• Guest memory in MB/s that is currently paged in / out by the hypervisor
• > 0 SWR/s will affect the Guest (check %SWPWT in the CPU view)
23
Memory related (limit)
A Limit will deny physical resource even if they are available
While VM memory is swapped in, the VM will not be scheduled• Check %SWPWT in the CPU view
24
Memory related (limit)
It is still the most common reason for performance issues• You can check if a VM has a limit via the GUI, a PowerCLI query or esxtop:
• memory view with the ‘f’ MEM ALLOC field
• -1 is the default of “unlimited”
• PowerCLI in only 3 lines:
Get-VMGet-VMResourceConfigurationWhere-Object {$_.MemLimitMB -ne '-1'}
Check your Templates for long forgotten limit settings
25
Memory (general per VM counters)
MEMSZ (Memory Size)• Amount of assigned VM memory
TCHD (Touched)• recently used memory based on statistic sampling from the VMkernel
• not comparable to Guest OS internal consumption counters
SHRDSVD (Shared Saved)• memory that is saved for this VM because of TPS (Transparent Page Sharing)
GRANT• memory that has been touched at least once by the VM
• GRANT – SHRDSVD = VM memory that is backed by machine memory
COWH (Copy-On-Write Hinted)• memory that is already hashed and could be shared
26
Memory (counter mapping esxtop -> vSphere Client)
esxtop (memory view)
VM Resource Allocation tab• Consumed = GRANT - SHRDSVD+ OVHD
• Active = TCHD
Host VM Summary tab• Host Mem - MB = Consumed
• Guest Mem - % = Active
27
Memory related (memory consumption)
Host memory usage alarm • High “Host Mem”, very low “Guest Mem”
esxtop memory view
• very low amount of shared pages
• high amount of shareable pages
28
Memory related (memory consumption)
Compared to another VM with the same amount of memory• relative low “Host Mem”
• very high amount of shared pages, mostly zero pages
• relative low amount of shareable pages
Transparent Page Sharing can only share small (4KB) pages• VMs running with the hwMMU mode are backed with large (2MB) pages
• This can result in ~ 20% performance improvement
KBs: Use of large pages can cause memory to be fully allocated and Transparent Page Sharing (TPS) in hardware MMU systems
29
Memory related (memory consumption)
LPs will be broken down once the host becomes overcommitted
Identify the Monitor Mode of a VM• via the CLI:# grep "MONITOR MODE" vmware.log | cut -d ":" -f 4-
vmx| MONITOR MODE: allowed modes : BT HV HWMMU vmx| MONITOR MODE: user requested modes : HWMMU vmx| MONITOR MODE: guestOS preferred modes: BT HWMMU HV vmx| MONITOR MODE: filtered list : HWMMU
30
Take Home Message
Check for unintended memory limits Make sure power management is set according to your needs• Disable C1E in the BIOS for latency sensitive applications
Document performance issues thoroughly
Thank you for your time and enjoy VMworld Europe
31
References
CPU Scheduling / Memory Management• VMware vSphere: The CPU Scheduler in VMware ESX 4.1
• Understanding Memory Resource Management in VMware vSphere 5.0
• Memory Resource Management in VMware ESX Server
• Understanding Host and Guest Memory Usage
• Large Page Performance
• Whitepaper for RVI (AMD) Performance Improvements
• Whitepaper for EPT (Intel) Performance Improvements
Best Practices / Troubleshooting Guide• Performance Best Practices for VMware vSphere 5.0
• VMware vCenter Server Performance and Best Practices for vSphere 4.1
• Troubleshooting Performance Related Problems in vSphere 4.1 Environments
32
References
Virtual Machine Monitor• Software and Hardware Techniques for x86 Virtualization
• Virtual Machine Monitor Execution Modes in VMware vSphere 4.0
• A Comparison of Software and Hardware Techniques for x86 Virtualization
• Performance aspects of x86 virtualization
• The Evolution of an x86 Virtual Machine Monitor (non-free)
vCenter Stats / General• vSphere Resource Management Guide 5.0
• vCenter Performance Counters
• Understanding VirtualCenter Performance Statistics
• Virtualization performance: perspectives and challenges ahead (non-free)
33
References
Esxtop• ESXtop for Advanced Users (2008)
• ESXtop for Advanced Users (2009)
• Troubleshooting using ESXTOP for Advanced Users (2010)
• Interpreting esxtop 4.1 Statistics
• esxtop (yellow bricks) (external)
34
‘u’ Disk Device View
Backup Slide: Disk (latency counters)
GAVG/cmd (Guest)• Latency observed by the Guest OS
KAVG/cmd (VMkernel)• Latency introduced by the VMkernel
DAVG/cmd (Device)• Latency observed above the SCSI layer
QAVG/cmd (Queue)• Queue time that is introduced by the
physical layer
VM
VMKernel Layers
Linux SCSI Layer
DAV
G
QAV
G
KAV
G
GAV
G
35
Backup Slide: Disk (new in ESX 4.1)
New in ESX 4.1
‘v’ VM Disk view now using vSCSIstats• Now only displays VM worlds
• ‘e’ will now show the backing disk instead of the VM’s sub worlds
• vm-support –S will no longer collect per VM storage stats
Now incudes NFS stats
36
Acknowledgements
Many thanks to Emiliano Turra for answering so many questions over the last years!