vSpere performance

download vSpere performance

of 64

description

vSpere performance best practices by VMware

Transcript of vSpere performance

  • VSP1800

    @Insertspeaker

    vSphere

    Performance

    Best Practices

    Robert Moran

    Premier Services Engineer VMware, Inc. Global Support Services Cork, Ireland

  • 2

    Disclaimer

    This session may contain product features that are currently under development.

    This session/overview of the new technology represents no commitment from VMware to deliver these features in

    any generally available product.

    Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

    Technical feasibility and market demand will affect final delivery.

    Pricing and packaging for any new technologies or features discussed or presented have not been determined.

  • 3

    Global Support Services and Customer Advocacy

    Bangalore, India

    Tokyo, Japan

    Cork, Ireland Burlington, Canada

    Palo Alto, CA Broomfield, CO

    Support offices

    Local language support

    Spanish, Portuguese, French, German, Japanese, Chinese

    Global Coverage

    24x7, 365 days/year

    6 Support Centers

    1000+ Support

    Engineers

    Follow-the-sun

    Support for

    Severity 1 Issues

    Support Relationships

    with 100% of the

    Fortune 100;

    99% of Fortune 500

  • 4

    Customer Support Day Events

    Coming to a location near you: sharing of VMware best practices!

    Support Days are a collaboration between VMware Support, Sales and customers you learn directly from the experts

    Topics are driven by customer input, and

    typically include:

    Best practices

    Tips/tricks

    Top issues

    Product roadmaps/demos

    Certification offerings

    http://www.vmware.com/go/supportdays

  • 5

    Overview

    What a performance problem sounds like:

    My VM is running slow and I dont know what to do!

    I tried adding more memory and CPUs but the problem got worse!`

    My VM is slow on one host but fast on another!

    What to look for? Where to start?

    We will explore some of the most common performance-related

    issues that our support centers receive cases for

  • 6

    A word about performance.

    Troubleshooting methodology must define: How to find root cause

    How to fix the problem

    Must answer these questions: 1. How do we know when we are done?

    2. Where do we start looking for problems?

    3. How do we know what to look for to identify a problem?

    4. How do we find the root-cause of a problem we have identified?

    5. What do we change to fix the root-cause?

    6. Where do we look next if no problem is found?

  • 7

    Agenda

    Benchmarking & Tools

    Best Practices and Troubleshooting

    The 4 food groups Memory

    CPU

    Storage

    Network

  • 2009 VMware Inc. All rights reserved

    BENCHMARKING & TOOLS

  • 9

    Benchmarking

    Consistent and reproducible results

    Important to have base level of acceptable performance Expectation vs. Acceptable

    Determine baseline of performance prior to deployment Benchmark on a physical system if applicable

    Avoid subjective metrics, stay quantitative The system seems slower

    This worked better last year

  • 10

    Benchmarking

    Benchmarking should be done at the application layer Use application-specific benchmarking tools and load generators

    Check with the application vendor

    Isolate variables, benchmark optimum situation before introducing load

    Understand dependencies Human interaction

    Other food groups

    Compare apples-to-apples

  • 11

    Aggregates thousands of metrics into Workload, Capacity, Health scores

    Self-learns normal conditions using patented analytics

    Smart alerts of impending performance and capacity degradation

    Identifies potential performance problems before they start

    Slide 11

    Tools vCenter Operations

  • 12

    Tools vCenter Operations

    Slide 12

  • 13

    Tools esxtop

    Valuable tool built in to vSphere hosts

    View or capture real-time data View or playback data later

    Import data in 3rd party tools

    vSphere Client performance graphs get their data from the kernel and VSI

    Presentation/unit may be different (e.g. %RDY)

    http://communities.vmware.com/docs/DOC-9279

  • 2009 VMware Inc. All rights reserved

    MEMORY

  • 15

    Memory Overhead

    A VMs RAM is not necessarily machine RAM vRAM + overhead = maximum machine RAM

    Source: vSphere 5.1 Resource Management Guide

    Note: These are estimated values

  • 16

    Memory Host Memory Management

    Occurs under normal circumstances and when there is contention

    Transparent Page Sharing

    Occurs when memory is under contention

    Ballooning

    Compression

    Swapping

  • 17

    Memory Transparent Page Sharing

  • 18

    Memory Ballooning

  • 19

    Memory Compression

  • 20

    Memory Swapping

  • 21

    Memory Swapping

  • 22

    Memory Ballooning vs. Swapping

    Ballooning is better than swapping

    Guest can surrender unused/free pages

    Guest chooses what to swap, can avoid swapping hot pages

  • 23

    Memory VM Resource Allocation

  • 24

    Memory Resource Pool Allocation

  • 25

    Memory Rightsizing

    Generally it is better to OVER-allocate than UNDER-allocate

    If the running VMs are consuming too much host/pool memory Some VMs may not get physical memory

    Ballooning or host swapping

    Higher disk IO

    All VMs slow down

  • 26

    Memory Rightsizing

    If a VM has too little vRAM Applications suffer from lack of RAM

    The guest OS swaps

    Increased disk traffic, thrashing

    SAN slow down as a result of increased disk traffic

    If a VM has too much vRAM Higher overhead memory

    Possible decreased failover capacity

    Longer vMotion time

    Larger VSWP file

    Wasted resources

  • 27

    Memory Troubleshooting

    Wrong resource allocation May not notice a limit, e.g. VM or template with a limit gets cloned

    Custom share values

    Ballooning or swapping at the host level Ballooning is a warning sign, not a problem

    Swapping is a performance issue if seen over an extended period

    Swapping/paging at the guest level Under-provisioned guest memory

    Missing balloon driver (Tools)

  • 28

    Memory Best Practices

    Avoid high active host memory over-commitment No host swapping occurs when total memory demand is less than the physical

    memory (Assuming no limits)

    Right-size guest memory Avoid guest OS swapping

    Ensure there is enough vRAM to cover demand peaks

    Use a fully automated DRS cluster Use Resource Pools with High/Normal/Low shares

    Avoid using custom shares

  • 2009 VMware Inc. All rights reserved

    CPU

  • 30

    CPU Overview

    Raw processing power of a given host or VM Hosts provide CPU resources

    VMs and Resource Pools consume CPU resources

    CPU cores/threads need to be shared between VMs

    Fair scheduling vCPU time Hardware interrupts for a VM

    Parallel processing for SMP VMs

    I/O

  • 31

    CPU esxtop

  • 32

    CPU esxtop

    Interpret the esxtop columns correctly

    %RDY - The percentage of time a VM is ready to run, but no physical processor is ready to run it

    %USED Physical CPU usage

    %SYS Percentage of time in the VMkernel

    %IDLE %WAIT- %IDLE can be used to estimate I/O wait time

  • 33

    CPU Performance Overhead & Utilization

    Different workloads have different overhead costs (%SYS) even for the same utilization (%USED)

    CPU virtualization adds varying amounts of system overhead Direct execution vs. privileged execution

    Non-paravirtual adapters vs. paravirtual adapters

    Virtual hardware (Interrupts!)

    Network and storage I/O

  • 34

    CPU vSMP

    Relaxed Co-Scheduling: vCPUs can run out-of-sync

    Idle vCPUs incur a scheduling penalty configure only as many vCPUs as needed

    Imposes unnecessary scheduling constraints

    Use Uniprocessor VMs for single-threaded applications

  • 35

    CPU Scheduling

    Over committing physical CPUs

    VMkernel CPU Scheduler

  • 36

    CPU Scheduling

    Over committing physical CPUs

    VMkernel CPU Scheduler

    X X

  • 37

    CPU Scheduling

    Over committing physical CPUs

    VMkernel CPU Scheduler

    X X X X

  • 38

    CPU Ready Time

    The percentage of time that a vCPU is ready to execute, but waiting for physical CPU time

    Does not necessarily indicate a problem Indicates possible CPU contention or limits

  • 39

    CPU NUMA nodes

    Non-Uniform Memory Access system architecture

    Each node consists of CPU cores and memory

    A CPU core in one NUMA node can access memory in another node, but at a small performance cost

    vNUMA is now available in vSphere 5

    NUMA node 1 NUMA node 2

  • 40

    CPU - vNUMA

    Virtual NUMA (vNUMA) exposes host NUMA topology to the guest operating system

    Requires hardware version 8

    Enabled by default on VMs with more than 8 vCPUs VMs with 8 or less need to have their advanced configuration edited to enable

    vNUMA

  • 41

    CPU Power Management

    Can be set from the vSphere Client

  • 42

    CPU Troubleshooting

    vCPU to pCPU over allocation HyperThreading does not double CPU capacity!

    Limits or too many reservations can create artificial limits.

    Expecting the same consolidation ratios with different workloads Virtualizing easy systems first, then expanding to heavier systems

    Compare Apples to Apples

    Frequency, turbo, cache sizes, cache sharing, core count, instruction set

  • 43

    CPU Best Practices

    Right-size vSMP VMs

    Keep heavy-hitters separated Fully automated DRS should do this for you

    Use anti-affinity rules if necessary

    Use a fully automated DRS cluster Use Resource Pools with High/Normal/Low shares

    Avoid using custom shares

  • 2009 VMware Inc. All rights reserved

    STORAGE

  • 45

    Storage esxtop Counters

    Different esxtop storage views Adapter (d)

    VM (v)

    Disk Device (u)

    Key Fields: DAVG + KAVG = GAVG

    QUED/USD Command Queue Depth

    CMDS/s Commands Per Second

    MBREADS/s

    MBWRTN/s

  • 46

    Storage Troubleshooting with esxtop

    High DAVG: issue beyond the adapter Over utilized storage processors, too few platters in the RAID set, etc.

    High KAVG: issue in the kernel storage stack Driver issue

    Full queue

    Aborts: GAVG exceeding 5000 ms Command will be repeated, storage delay for the VM

  • 47

    Storage Benchmarking with iometer

  • 48

    Storage Storage I/O Control

    Allows the use of Shares per VMDK

    Throttling occurs when datastore reaches latency threshold Higher share VMDKs perform IO first

    vCenter monitors latency across all hosts Not effective if datastore shared with other vCenters

  • 49

    Storage Storage DRS

    Datastore clusters Maintenance mode

    Anti-affinity rules

    vCenter monitors for latency and disk space Migrate VMDKs for better performance or utilization

    Not effective with automated tiering SANs Check HCL to confirm these features are compatible

  • 50

    Storage Troubleshooting

    Snapshots

    Excessive traffic down one HBA / Switch / SP can cause latency Consider using Round Robin in conjunction with ALUA

    Always be paranoid when it comes to monitoring storage I/O

    Consider your I/O patterns Peak time for storage IO?

    Virus scans, database maintenance, user logins

    Always consult with array vendor They know the best practices for their array!

  • 51

    Storage Best Practices

    Use different tiers of storage for different VM workloads Slower storage for OS VMDKs

    Faster storage for databases or other high-IO applications

    Use the Paravirtual SCSI adapter Reduced overhead, higher throughput

    Use path balancing where possible, either through 3rd party plugins / Round Robin and ALUA, if supported.

    Use Storage DRS with SIOC Balance for both free space and latency

    Simplified datastore management

  • 2009 VMware Inc. All rights reserved

    NETWORK

  • 53

    Network Load Balancing

    Load balancing defines which uplink is used Route based on Port ID

    Route based on IP hash

    Route based on MAC hash

    Route based on NIC load (Load Based Teaming)

    Probability of high-bandwidth VMs being on the same physical NIC

    Traffic will stay on elected uplink until an event occurs NIC link state change, adding/removing NIC from a team, beacon probe

    timeout

  • 54

    How to Check Network Performance

    VM VM on same ESXi host. This will exclude physical network problems

    VM VM on different ESXi host. This will involve physical NICs and switch as well

    Physical VM. Will also test physical devices but we can focus on one VM

    Physical Physical: this will give us some number about what to expect

    Use iperf/jperf/netperf. Free tool for network test

  • 55

    Iperf

  • 56

    Iperf

    Windows and Linux version

    Will not use storage

    We can use different option for test (UDP/TCP)

    Automatically calculates bandwidth

  • 57

    Network Troubleshooting

    Check counters for NICs and VMs Network load imbalance

    10 Gbps NICs can incur a significant CPU load when running at 100%

    Ensure hardware supports TSO Use latest drivers and firmware for your NIC on the host

    For multi-tier VM applications, use DRS affinity rules to keep VMs on same host

    Same vSwitch / VLAN, rules out physical network

    If using Jumbo Frames, ensure it is enabled end-to-end

  • 58

    Network Best Practices

    Use the vmxnet3 virtual adapter Less CPU overhead

    10 Gbps connection to vSwitch

    Use the latest driver/firmware for the NICs on the host

    Use network shares Requires Virtual Distributed Switch 4.1

    Isolate vMotion and iSCSI traffic from regular VM traffic Separate vSwitches with dedicated NIC(s)

    Most applicable with Gigabit NICs

  • 59

    In conclusion

  • 60

    Key Takeaways Performance Best Practices

    Understand your environment Hardware, storage, networking

    VMs & applications

    Advanced configuration values do not need to be tweaked or modified

    In almost all situations

    Use fully automated DRS

    Use Paravirtual hardware

  • 61

    Important Links

  • 62

    Important Links

  • FILL OUT A SURVEY

    AT WWW.VMWORLD.COM/MOBILE

    COMPLETE THE SURVEY

    WITHIN ONE HOUR AFTER

    EACH SESSION AND YOU WILL

    BE ENTERED INTO A DRAW

    FOR A GIFT FROM THE

    VMWARE COMPANY STORE

  • VSP1800

    @Insertspeaker

    vSphere

    Performance

    Best Practices

    Robert Moran

    Premier Services Engineer VMware, Inc. Global Support Services Cork, Ireland