Anatomiaestiramientos 130305111855 Phpapp02 140129215950 Phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
-
Upload
suresh-kumar -
Category
Education
-
view
793 -
download
1
description
Transcript of Vmwareperformancetroubleshooting 100224104321-phpapp02
VMware Performance Troubleshooting
Presented by Chris Kranz
Topics Covered• Introduction• Root Cause Analysis• Performance Characteristics• CPU• Networking• Memory• Disk• Virtual Machine optimisation• ESXTop• vm-support• Service Console• Resource Groups• Design Guidelines• Capacity Planner limitations and cautions• Conclusion• Reference Articles
Introduction
Multiple layers of virtualisation are used to increase service levels, availability and manageability
However, multiple layers of virtualisation often mask performance and configuration issues making it more of a challenge to troubleshoot and correct
The worst out come is that performance issues after a virtualisation project lead to the perception that VMware results in reduced performance and future confidence in VMware can be affected
• Virtual Machine Resources– CPU– Memory– Disk– Networking
Performance Basics
Resource Maximums
Host Guest
Logical Processors 64 N/A
Virtual CPUs N/A 8
Virtual CPU’s per Core 20 N/A
Memory 1TB 256GB
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
Typical Host
vSphere 1U Host
CPU’s 2 x Quad Core
Memory 32-64GB RAM
Typical 3 VMs per core, 24VM’s per HostEach has 2GB of RAM = 48GB of RAM
Root Cause Analysis
http://www.vmware.com/resources/techresources/10066
Root Cause ...
• Do not rely on guest tools, but– Can show high CPU, & Memory Utilisation– Measurement of Latency & throughput of Disk &
Network Interfaces• Use the virtualisation layer, to diagnose cause:– Guest is unaware of virtualisation workload– The way in which guest OS’s account time is
different– No visibility of available resources
Monitoring Performance
• esxtop (service console only)• resxtop (remote command line utilities)• Performance graphs in vCentre
Performance Analysis Tools
• esxtop can be run:– Interactively – Batch (eg. esxtop -a -b > analysis.csv)– Load batch into windows perfmon or MS Excel
• Two keys to remember– H : help– F : fields to display
esxtop
esxtop basics
Number of WorldsName of Resource Pool, Virtual Machine or World
Host Resources
Performance Characteristics
CPU NetworkingMemory DiskSlow ProcessingHigh CPU Wait
Packet LossSlow Network
Slow ProcessingDisk Swapping
Log StallsDisk Queue
Slow Application PerformanceReduced User ExperienceData Loss and Corruption
CPUESX Scheduler
ServiceConsole
VirtualMachine
Limits / Shares / Reservations
Basic World StatesRead / Run / Wait
CPU StatesReady / Usage / Wait
CPUesxtop
•PCPU(%): CPU utilization•%USED: Utilization•%RDY: Ready Time•%RUN: Run Time•%WAIT: Wait and idling time
High %RDY + High %User can imply over commitment
CPUVI-Client
Used Time > Ready Time: Possible CPU over-committment
Used Time
Ready Time
CPUFurther Investigation
%MLMTD shows this VM has been limited
CPUFurther Investigation
High ready time caused by CPU resource limit
VMware Memory Management• Transparent Page Sharing• VMware Tools Balloon Driver to force the VM to swap to disk• Virtual Machine Page File
MemoryBallooning vs. Swapping
Ballooning driver causes the host to swap pages that it chooses to disk
ESX Swapping will swap any pages to disk.
• Ballooning can be disabled (0 value) or controlled on a per Virtual Machine basis using:sched.mem.maxmemctl
• Default is set to 65%, can be controlled at host level.
• Only is an issue in resource contention scenarios. (or VM’s with low latency eg Citrix)
Memory
Memory - Host
VI Client shows memory usage of the host. This is calculated as “consumed + overhead memory + Service Console”.
Performance charts are a very good way of showing the Virtual Machine memory breakdown.
• Consumed Memory• Ballooned Memory• Shared Memory• Swapped Memory
Memory - Guest
Host Memory = Consumed + Overhead MemoryGuest Memory = Active Memory for Guest OS
Memory – Guest Overhead
Memory
Metric DescriptionMemory Active (KB) Physical pages touched recently by a VM
Memory Usage (%) Active memory / configured memory
Memory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of shared pages. Doesn’t include overhead memory
Memory Granted (KB) Physical pages allocated to a virtual machine. May be less than configured memory. Includes shared pages. Doesn’t include overhead memory.
Memory Shared (KB) Physical pages shared with other virtual machines
Memory Balloon (KB) Physical memory ballooned from a virtual machine
Memory Swapped (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative
Overhead Memory (KB) Machine pages used for virtualisation
Virtual Machine Memory Metrics – VI Client
Memory
Metric DescriptionMemory Active (KB) Physical pages touched recently by the host
Memory Usage (%) Active memory / configured memory
Memory Consumed (KB) Total host physical memory – free memory on host. Includes Overhead and Service Console memory
Memory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesn’t include overhead memory.
Memory Shared (KB) Physical pages shared by virtual machines on host
Shared Common (KB) Total machine pages used by shared pages
Memory Balloon (KB) Machine pages ballooned from virtual machines
Memory Swap Used (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative
Overhead Memory (KB) Machine pages used for virtualisation
Host Memory Metrics – VI Client
Memoryesxtop
PMEM: Total physical memory breakdownVMKMEM: Memory managed by vmkernelCOSMEM: Service Console memory breakdownPSHARE: Page sharing statisticsSWAP: Swap statisticsMEMCTL: Balloon driver data
Memory
VI Client esxtopActive Memory TCHDMemory Usage %ACTVConsumed Memory N/AMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOPMemory Balloon MCTLSZMemory Swapped SWCUR (SWR/s & SWW/s are rates)Overhead Memory OVHD & OVHDMAX
esxtop / VI Client metrics : Virtual Machines
Memory
VI Client esxtopMemory Active N/A (try /proc/vmware/sched/mem-verbose)Memory Usage N/A (try /proc/vmware/sched/mem-verbose)Memory Consumed PMEM total – PMEM freeMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared PSHARE (shared)Memory Shared Common PSHARE (common)Memory Balloon MEMCTLMemory Swap Used SWAP (r/w and w/s are rates)Overhead Memory OVHD & OVHDMAX
esxtop / VI Client metrics : Host Usage
MemoryVI Client memory usage graph
MemoryTroubleshooting Memory usage issues
Networking
Network configuration is more likely to blame than resource contention
•Switch Assisted Teaming (IP Hash)•VLAN Trunking•Flow Control (full)•Speed & Duplex (1000Mb / Full)•Port Fast•BPDU Disabled•STP Disabled•Link State Tracking•Jumbo Frames
Networkingesxtop
Transmit and Receive in Mb/s
Transmit and Receive in Packets
Networkingesxtop
Drop Packets Received
Dropped Packets Transmit
Disk
Varying Factors• File system performance• Disk subsystem configuration (SAN, NAS, iSCSI, local disk)• Disk caching• Disk formats (thick, sparse, thin)
ESX Storage Stack• Different latencies for different disks• Queuing within the kernel
K: KernelD: DeviceG: Guest
Disk
Quite Coarse Statistics• Disk read / write rate (KB/s)• Disk usage: sum of read BW and write BW (KB/s)• Disk read / write requests (per 20s interval)• Bus resets / Command aborts (per 20s interval)• Per LUN or aggregated stats
VI Client statistics
DiskAggregated stats similar to VI Client• Disk read / write per sec (READS/s, WRITES/s)• MB read / write per sec (MBREAD/s, MBWRTN/s)
Latency Statistics• Kernel Average / command (KAVG/cmd)• Device Average / command (DAVG/cmd)• Guest Average / command (GAVG/cmd)
Queuing Information• Adapter Queue Length (AQLEN)• LUN Queue Length (LQLEN)• VMKernel (QUED)• Active Queue (ACTV)• %Used (%USD = ACTV/LQLEN)
esxtop statistics
DiskSAN Rough Estimates
Purely looking at a single ESX host, roughly:Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec
FC, rough maximums:Effective Link Bandwidth = ~80/90% of Real Bandwidth
Effective (2Gbps) = 200 – 230 MBpsEffective (4Gbps) = 410 – 460 MBpsEffective (8Gbps) = 820 – 920 MBps
iSCSI / NFS / FCoE, rough maximums:Effective Link Bandwidth = ~70/80% of Real Bandwidth
Effective (1GigE) = 90 – 100 MBpsEffective (10GigE) = 900 – 1000 MBps
DiskDesired Latency CalculationsDesired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host
Example:Number of Hosts: 16Effective Link Bandwidth: 90 MBpsThroughput per host: 90 / 16 = 5.6 MBpsDesired Latency: (32 * 32) / (5.6) = 182.86 msec
Workload Cached Sequential Read Cached Sequential Write
Desired Latency (msec) 182.86 182.86
Observed Latency (msec) ~350 ~180
Throughput Drop? Yes No
Throughput (MBps) ~45 ~90
DiskVI Client
SAN Cache disabled Poor throughput
SAN Cache enabledHigh throughput
Diskesxtop
Latency is quite high
After enabling cache,Latency is reduced
Virtual Machine OptimisationDeploy all machines from an optimised template!
• VMware tools MUST be installed• The disks MUST be block aligned to the storage (even when using NFS and SAN)• Where possible, always separate data disks from OS disks• Windows performance settings should be optimised for application performance• Guest operating system timeouts should be set as defined by the SAN vendor• Pagefile should be separated where appropriate (this can impact VMware SRM however)• Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)• Last access update time should be disabled (unless where required)• Logging of the VM should be disabled (only enabled for troubleshooting)• Remove any unused virtual hardware (floppy drives, USB, etc.)• Disable screen savers and power saving features, including logon screen saver• Enable Remote Desktop, avoid using the VI Client for remote administration• Install standard applications into template (bginfo, AntiVirus, any host agents, etc)• Multiple-CPU’s should be allocated sparingly
Virtual Machine OptimisationBlock alignment is vital to good disk performance!
esxtopCommand Actionspace Update the display? Show the help pageq quitf / F Add or Remove columns from the displayo / O Change the order the display is sorteds change the update interval# change the number of instances to displayW Write configuration to filee Expand / Rollup CPU StatsV View only VM instancesL Change the length of the NAME fieldm Display memory statisticsn Display network statisticsi Display interrupt statisticsd Display disk adapter statisticsu Display disk device statisticsv Display disk VM statistics
Command Options when inside esxtop
esxtop
Command Action-b batch mode-l locks the objects available in the first snapshot-s enables secure mode-a show all statistics-c sets the configuration file-R enables replay mode (used with “vm-support –S”)-d sets the update interval-n runs esxtop for n iterations
Command Line Optionsfrom the console
esxtop
Expand the default window size for your session to get all statistics
vm-supportCreates a packaged zip file containing the following sections:• boot
• contains the grub configuration• etc
• contains the Console OS configuration files (cron, tcpwrappers, syslog, etc)• proc
• contains much of the hardware configuration modules and variables• tmp
• contains a lot of the ESX specific configuration output• var
• contains log files and any core dumps• vmfs
• contains the structure of the VMFS datastores• esx3-installation (where appropriate)
• contains a copy if the previous esx3 configuration variables
vm-supportUsing vm-support to extract performance information:
vm-support –S –d <duration> -i <interval><duration> and <interval> are in seconds
The output from this can then be replayed in esxtop for review after it has been extracted.
esxtop –R <path_to_vm-support_output>
Service Console Performance
•Multiple Service Console networks – for network resiliency•Increased Service Console memory – upto 800MB•Use host agents supplied by your vendors•Make storage recommended tweaks such as HBA Queue Depth and IO timeouts•Minimal use of the VI Client console – RDP or SSH instead•Properly sized vCenter server – 64bit OS where possible
Resource Groups
Dynamically reallocate resource shares
Additional VM, shares allow you to over-commit resources and have a graceful re-allocation
Remove a VM and exploit extra resources across all remaining VM’s
Design Guidelines• Full Resilience / Multiple paths• Standard configuration across all aspects (ESX, Storage, Networking, etc.)
• Standard naming conventions• Learn from others mistakes• Follow guidelines from vendors best-practices• Rule out the basics before requesting support
Capacity Planner & P2V Cautions and Limitations
• Peak CPU usage can sometimes be misleading• Back-end storage system performance• P2V machines will require block-aligning to the storage• P2V machines will still require guest OS optimisation
Conclusion• Performance issues can often be traced with simple root cause analysis using basic tools (VI Client / esxtop)• Performance tools help diagnose issues and help rule out non-issues• Performance tools are useful in different contexts, not always either/or• Real-time data and troubleshooting: esxtop• Historical data: VI Client• Coarse resource / cluster usage: VI Client• Detailed resource usage: esxtop
• Combine information from various tools to get a complete picture• Always benchmark your systems first so you not what the optimal performance is that you can receive
Reference Articles• http://www.vmware.com/pdf/esx3_memory.pdf• http://www.vmworld.com/docs/DOC-2370• http://blogs.vmware.com/performance/• http://communities.vmware.com/docs/DOC-5420• http://kb.vmware.com/kb/1008205 • http://communities.vmware.com/community/vmtn/general/performance• http://www.vmware.com/products/vmmark/ • http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf • http://www.vmware.com/pdf/GuestOS_guide.pdf • http://www.vmware.com/resources/techresources/10066 • http://www.vmware.com/resources/techresources/10059• http://www.vmware.com/resources/techresources/10062