Presentation v sphere 5 storage best practices

57
vSphere 5 Storage Best Practices Chad Sakac, EMC Corporation Vaughn Stewart, NetApp INF-STO2980 #vmworldinf

Transcript of Presentation v sphere 5 storage best practices

  • vSphere 5 Storage Best Practices

    Chad Sakac, EMC Corporation

    Vaughn Stewart, NetApp

    INF-STO2980

    #vmworldinf

  • &

    The Great Protocol Debate Every protocol can Be Highly Available, and generally,

    every protocol can meet a broad performance band Each protocol has different configuration considerations In vSphere, there is core feature equality across protocols

  • &

    The Great Protocol Debate

    Source: Virtual Geek Poll

  • &

    The Great Protocol Debate

    NetApp AutoSupport July 2012

    Large scale NetApp customers favor NAS

  • &

    The Great Protocol Debate

    17%

    51% 47%

    67%

    18%

    3% 1%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    DAS NFS iSCSI FC FCoE InfiniBand AoE

    Percentage of Respondents that deployed Protocols (Multiple selections allowed, n=158, selections=371)

    Source: Wikibon Survey July 2012

  • &

    The Great Protocol Debate Every protocol can Be Highly Available, and generally,

    every protocol can meet a broad performance band Each protocol has different configuration considerations In vSphere, there is core feature equality across protocols

    Conclusion: there is no debate pick what works for you!

    The best flexibility comes from a combination of VMFS and NFS

  • &

    A packed agenda

    6 Key Things To Do. Leverage Key Docs Setup Multipathing Right Alignment = Good Hygiene Leverage vCenter Plugins, VAAI, and VASA KISS Guidelines for Layout Use SDRS and SIOC if you Can

    What To Do When You Are In Trouble. When To Break The Rules. A Peak Into the Future.

  • &

    Leverage Key Docs

    Key Best Practices 2012

    1

  • &

    Key VMware Resources & Documents

    VMware technical Resource Center Storage Connectivity

    Fibre Channel SAN Config Guide iSCSI SAN Config Guide Best Practices for NFS Storage

    Understand Storage Taxonomy

    LUN ownership Active/Active Active/Passive Virtual Port

    Multipathing SAN / NAS

    Highly Recommended is a kind way of saying

    This Is Mandatory Reading

    http://www.vmware.com/technical-resources/virtual-storage/resources.html

  • &

    Key Partner Documents Storage varies far more vendor to vendor than servers do Stay current on your arrays Best Practices Even if youre NOT the storage team, read them

    NetApp: 7-Mode: Technical Report TR-3749 Cluster-Mode: Technical Report TR-4068 EMC: VNX and vSphere Techbook (h8229), VMAX and vSphere Techbook (h2529), Isilon and vSphere Best Practices Guide (h10522)

    http://www.emc.com/collateral/hardware/technical-documentation/h8229-vnx-vmware-tb.pdfhttp://www.emc.com/collateral/hardware/technical-documentation/h8229-vnx-vmware-tb.pdfhttp://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdfhttp://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdfhttp://www.emc.com/collateral/hardware/white-papers/h10522-bpg-isilon-and-vmware-vsphere5.pdf

  • &

    Setup Multipathing Right

    Key Best Practices 2012

    2

  • &

    Understanding the vSphere Pluggable Storage Architecture

  • &

    Whats out of the box in vSphere? Path Selection Policies (PSP):

    Fixed (Default for Active-Active arrays) I/O traverses preferred path Reverts to preferred path after failure

    MRU (Default for many Active-Passive arrays)

    I/O traverses preferred path Remains on alternative path after failure

    Round Robin I/O traverse all paths Default is 1000 IOPs per path ALUA sets path preferance Notable change in vSphere 5.1 this is the

    default for EMC VNX (R32), VMAX If upgrading, claim rules unchanged

    To Change a PSP: Use your Vendors

    vCenter Plug-in (easy)

    or

    resxcli nmp device setpolicy --device

    --psp VMW_PSP_RR

  • &

    What is Asymmetric Logical Unit (ALUA)?

    ALUA Allows for paths to be profiled Active (optimized) Active (non-optimized) Standby Dead (target unreachable = APD) Gone Away (device unreachable =

    PDL)

    Ensures optimal path selection vSphere PSP and 3rd Party

    MPPs

    SP A SP B

    LUN

  • &

    Understanding SAN Multipathing

    MPIO is based on initiator-target sessions not links

  • &

    Multipathing with NFSv3

    switch

    SP A 1 0

    SP B 1 0

    switch

    mnic2

    ESXi host

    vmnic0 vmnic1

    NIC teams with Route based on IP hash

    load-balancing policy

    cross stack EtherChannel

    switch port static or dynamic link aggregation

    active/active configuration

    single switch or stacked switches

    spanned, teamed switch ports (feature may not be available on all switches)

    yes no

    ESXi host

    vmnic0 vmnic1 allow VMkernel to make routing decisions

  • &

    Microsoft Cluster Service

    Unsupported Storage Configurations: FCoE, iSCSI & NFS datastores Round Robin PSP N-Port ID Virtualization (NPIV)

    Array vendors solve storage gaps! 3rd party MPPs Guest Connected Storage

    Other limits: Memory overcommit vMotion & Fault Tolerance

    vSphere 5.1 has expanded WSFC support 4 node with disk quorum, 5 node when MNS

  • &

    3rd Party Multi-Pathing Plugins (MPP)

    Storage manageability Simple Provisioning Predictable & consistent Optimize data-path utilization

    Performance and Scale Tune performance Predictive Load Balancing Automatic fault recovery

    3rd party MPPs: EMC PowerPath/VE (now v5.8) Dell/Equalogic PSP

    STO

    RAG

    E

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    OS

    APP

    PowerPath PowerPath PowerPath PowerPath

    Shared Storage

  • &

    General NFS Best Practices

    Use the EMC & NetApp vCenter plug-ins, automates best practices Note that vCenter plugins from 5.0 and earlier will NOT

    WORK with vSphere 5.1 (more on this later)

    Use Multiple NFS datastores & 10GbE 1GbE requires more complexity to address I/O scaling due

    to one data session per connection with NFSv3

  • &

    General NFS Best Practices - Timeouts Configure the following on each ESX

    server (automated by vCenter plugins): NFS.HeartbeatFrequency = 12 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 10

    Increase Guest OS time-out values to match HKLM> System> CurrentControlSet>

    Services> Disk. Select the TimeOutValue and set the data

    value to 125 (decimal).

    Increase Net.TcpIpHeapSize (follow vendor recommendation)

  • &

    General NFS Best Practices - Timeouts Configure the following on each ESX

    server (automated by vCenter plugins): NFS.HeartbeatFrequency = 12 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 10

    Increase Guest OS time-out values to match HKLM> System> CurrentControlSet>

    Services> Disk. Select the TimeOutValue and set the data

    value to 125 (decimal).

    Increase Net.TcpIpHeapSize (follow vendor recommendation)

  • &

    What has changed in vSphere 5?

    Minor change in NFS v3 client not NFS v4, NFS v4.1 or pNFS FQDN is specified in datastore configuration

    DNS lookup will occur on ESXi boot Supports DNS round robin

    Distribute NFS client logins across a vSphere 5 cluster for load balancing across multiple IPs EMC Isilon NetApp Data ONTAP 8 Cluster-Mode

  • &

    Path Management with Scale-Out Arrays

    Storage arrays have target addresses WWN / iQN / IP

    Scale-out array these addresses are virtual & mapped to physical I/O ports

    LUNs leverage multipathing sw to route as LUN traverses controllers

    On NetApp Cluster-Mode, NFSv3 requires one IP per datastore Avoid hopping arrays to access physical disk

    On EMC Isilon, use SmartConnect

  • &

    iSCSI & NFS Ethernet Jumbo Frames What is an Ethernet Jumbo Frame?

    Ethernet frames with more than 1500 bytes of payload (9000 is common FCoE is 2240)

    Commonly thought of as having better performance

    Should I use Jumbo Frames?

    Adds complexity & performance gains (while existent) are relatively marginal with common block sizes

    Stick with the defaults when you can

  • &

    IP Storage: Using iSCSI & NFS Together iSCSI and NFS route

    differently iSCSI uses vmknics with no

    Ethernet failover using MPIO instead

    NFS client relies on vmknics using link aggregation/Ethernet failover

    NFS relies on host routing table

    Best practice is to have separate subnets & virtual interfaces for each

  • &

    Optimize I/O aka Alignment Key Best Practices 2012

    3

  • &

    Alignment is Optimal I/O Misalignment of filesystems results in additional work on

    storage controller to satisfy IO request Affects every protocol, and every storage array

    VMFS & NFS Datastores VMDKs & RDMs with NTFS, EXT3, etc

    Filesystems exist in the datastore and VMDK

    Cluster

    Chunk

    Cluster

    Chunk

    Cluster

    Chunk

    Block VMFS 1MB-8MB

    Array 4KB-64KB

    Guest Alignment

    FS 4KB-1MB

  • &

    Alignment is Optimal I/O Misalignment of filesystems results in additional work on

    storage controller to satisfy IO request Affects every protocol, and every storage array

    VMFS & NFS Datastores VMDKs & RDMs with NTFS, EXT3, etc

    Filesystems exist in the datastore and VMDK

    Cluster

    Chunk

    Cluster

    Chunk

    Cluster

    Chunk

    Block VMFS 1MB-8MB

    Array 4KB-64KB

    Guest Alignment

    FS 4KB-1MB

  • &

    Disk Alignment

    Aligning I/O can have significant performance improvements for high disk I/O VMs

  • &

    Alignment is Optimal I/O

    VMware, Microsoft, Citrix, NetApp, EMC all agree, align partitions Plug-n-Play Guest Operating Systems

    Windows 2008, Vista, & Win7 Fresh installations only no upgrades

    Guest Operating Systems requiring manual alignment Windows NT, 2000, 2003, & XP (use diskpart to set to 1MB) Linux (use fdisk expert mode and align on 2048 = 1MB)

  • &

    Fixing Misalignment

    If VMFS is misaligned: migrate VMs & destroy datastore

    If GOS filesystem is misaligned Step 1: Take an array snapshot/backup Step 2: Use offline tools to realign

    EMC UBERAlign (open, works with all, scheduler, and in-guest reclaim)

    vSphere Migrator Alternate: Use online tool to align

    NetApp Migrate & Optimize (VSC feature)

  • &

  • Leverage Plug-ins VAAI & VASA

    Key Best Practices 2012

    4

  • Where Does Integration Happen? circa 2012

    FC FCoE iSCSI APIs &

    Mgmt

    vCenter

    VAAI SCSI cmds

    ESX Storage Stack

    Datamover

    Vendor-specific vCenter Plug-In

    View VMware-to-Storage relationships Provision datastores more easily

    Leverage array features (compress/dedupe, file/filesystem/LUN snapshots)

    VI Client VM

    Storage Array

    VMFS NFS

    NFS client

    Network Stack

    VMware LVM

    HBA Drivers

    VSS via VMware Tools Snap request

    SvMotion request VM provisioning cmd Turn thin prov on/off

    Standards-based VAAI SCSI command support

    vStorage API for Multi- pathing

    NMP

    NFS

    NIC Drivers

    vStorage API for Data Protection (VDDK)

    Vendor Specific vStorage API for

    SRM

    SRM

    VM object Awareness in array/mgmt

    tools

    Co-op

    Co-op

    Vendor-specific VAAI NFS operation support

    VASA Module

    NFS VAAI

    Module

    iSCSI/FCoE SW Vendor-specific VAAI block module

    vCOPs Connectors

  • &

    Where Does Integration Happen? circa 2012

    FC FCoE iSCSI APIs &

    Mgmt

    vCenter

    VAAI SCSI cmds

    ESX Storage Stack

    Datamover

    Vendor-specific vCenter Plug-In

    View VMware-to-Storage relationships Provision datastores more easily

    Leverage array features (compress/dedupe, file/filesystem/LUN snapshots)

    VI Client VM

    Storage Array

    VMFS NFS

    NFS client

    Network Stack

    VMware LVM

    HBA Drivers

    VSS via VMware Tools Snap request

    SvMotion request VM provisioning cmd Turn thin prov on/off

    Standards-based VAAI SCSI command support

    vStorage API for Multi- pathing

    NMP

    NFS

    NIC Drivers

    vStorage API for Data Protection (VDDK)

    Vendor Specific vStorage API for

    SRM

    SRM

    VM object Awareness in array/mgmt

    tools

    Co-op

    Co-op

    Vendor-specific VAAI NFS operation support

    VASA Module

    NFS VAAI

    Module

    iSCSI/FCoE SW Vendor-specific VAAI block module

    5.1 Change

    5.1 Change

    5.1 Change

    Inyo Change

    vCOPs Connectors

    New coolness

    5.1 Change

  • &

    New VAAI Stuff in vSphere 5.x

    VAAI TP (Block) reclaim used by View 5.1 sparse VDMK format

    VAAI TP at datastore level disabled by default in vSphere 5.0 u1 and vSphere 5.1 (will be back on in future vSphere releases)

    VAAI TP reclaim using vmkfstools -k VAAI Fast Clone (File) used by View 5.1 and

    vCloud Director 5.1 Depends on file-level snaps Hardware accelerated linked clone

  • &

    VAAI NFS Demo

  • &

    vCenter Plug-ins First gen was basic

    view/provision Second gen exposed advanced

    array functions Third gen worked on

    simplifying/merging multiple plugins

    Fourth gen worked on initial RBAC for VMware/Storage teams

    Fifth gen is current. Next gen vSphere 5.1 requires

    new plugin architecture around FLEX.

    We use EMC Virtual Storage Integrator (VSI) to

    dramatically accelerate and simplify storage configuration,

    management, and multipathing and it has saved

    us days of work Mike Schlimenti, Lead Systems Engineer,

    Data Center Experian

  • &

    FLEX Plugin Demo

  • &

    Keep It Simple

    Key Best Practices 2012

    5

  • Keep Storage Simple 1. Use Large capacity datastores

    1. Avoid RDMs 2. NFS: 16TB 3. VMFS: vSphere 5 = 64TB 4. vSphere 4 = 2TB 5. Avoid extents

    2. On array consider 1. Use Storage Pools 2. Use thin provisioned LUNs & Volumes

    1. Enable vCenter managed datastore alerts 2. Enable array thin provisioning alerts and auto-grow capabilities

    3. Use broad data services rather than micromanage 1. Virtual / auto-tiering & large caches 2. Enable data deduplicaiton

  • &

    SRDS and SIOC

    Key Best Practices 2012

    6

  • &

    Use SRDS/SIOC if you can SRDS and SIOC are huge vSphere features If you can equals:

    vSphere 4.1 or later, Enterprise Plus VMFS, NFS if vSphere 5.1 (not purely a qual)

    Enable it (not on by default), even if you dont use shares will ensure no VM swamps the others

    Bonus is you will get guest-level latency alerting! Default threshold is 30ms

    Leave it at 30ms for 10K/15K, increase to 50ms for 7.2K, decrease to 10ms for SSD

    Fully supported with array auto-tiering - leave it at 30ms for FAST pools Hard IO limits are handy for View use cases Some good recommended reading:

    http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdf http://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-

    array-auto-tiering.html http://virtualgeek.typepad.com/virtual_geek/2010/08/drs-for-storage.html http://www.yellow-bricks.com/2010/09/29/storage-io-fairness/

    http://www.vmware.com/files/pdf/techpaper/VMW-vSphere41-SIOC.pdfhttp://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-auto-tiering.htmlhttp://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-auto-tiering.htmlhttp://virtualgeek.typepad.com/virtual_geek/2010/08/drs-for-storage.htmlhttp://www.yellow-bricks.com/2010/09/29/storage-io-fairness/

  • &

    Storage DRS Operations IO Thresholds

    SDRS triggers action on either capacity and/or latency

    Capacity stats are constantly gathered by vCenter, default threshold 80%.

    I/O load trend is evaluated (default) every 8 hours based on a past day history, default threshold 15ms.

    Storage DRS will do a cost / benefit analysis!

    For latency Storage DRS leverages Storage I/O Control functionality.

    When using array Auto-Tiering, use SDRS, but disable I/O metric here. This combination gives you the simplicity benefits of SDRS for automated placement and capacity balancing but adds: Economic and performance

    benefits of automated tiering across SSD, FC, SAS, SATA

    10x (VNX) and 100x (VMAX) higher granularity (sub VMDK)

  • &

    Storage DRS Array use-case considerations Feature/Product/Use Case SDRS Initial

    Placement SDRS Migration

    VMware Linked Clones Not supported

    VMware Snapshots Supported

    VMware SRM Not Supported

    RDM Pointer Files Pointers Supported / LUNs not

    Pre vSphere 5 hosts Not Supported

    NFS Datastores Supported

    Distributed Virtual Volumes Supported

    Array-based VM Clones Supported

    Array-based Replication Supported Unanticipated migrations will increase WAN utilization

    Array-based Snapshots Supported Unanticipated migrations will increase space consumed

    Array-based Compression & Deduplication

    Supported Unanticipated migrations will temporarily increase space consumed

    Array-based thin provisioning Supported Supported on VASA-enabled arrays only

    Array based auto-tiering Supported Supported. Disable I/O metrics in SDRS, but enable SIOC on datastores to handle spikes of I/O contention.

  • &

    What to do when youre in trouble...

    Getting yourself out of a jam

  • My VM is not performing as expected How do I know: application not meeting a pre-

    defined SLA, or SIOC/SDRS GOS thresholds being exceeded

    What do I do: Step 1, pinpoint (thank you Scott Drummonds!)

    Use ESXTop first: http://communities.vmware.com/docs/DOC-5490

    ..then vSCSIstats: http://communities.vmware.com/docs/DOC-10095

    Step 2, if the backend: Use Unisphere Analyzer, SPA (start with backend and CPU) Check VM alignment (will show excessive stripe crossings) Cache enabled, FAST/FAST Cache settings on the storage

    pool ensure FAST and SIOC settings are consistent

    http://communities.vmware.com/docs/DOC-5490http://communities.vmware.com/docs/DOC-10095

  • &

    I see all these device events in vSphere

    How do I know: VM is not performing well and LUN trespasses warning messages in event log

    What do I do: ensure the right failover mode and policy are used. Ensure you have redundant paths from host to storage system. Check LUN ownership balance.

  • &

    Datastore capacity utilization is low/high

    How do I know: Managed Datastore Reports in vCenter 4.x + Array tools - e.g. EMC Unisphere (vCenter Integration)

    Report, EMC Prosphere, NetApp OnCommand

    What do I do: Consider doing space reclaim using vmkfstools -k Migrate the VM to a datastore that is configured over a

    virtually provisioned storage. For VMFS datastore, ESX thin provisioning/compress/dedupe can also be utilized

  • &

    My storage team gives me tiny devices How do I know:

    Storage team controls provisioning What do I do:

    This means you have an legacy mindset in storage Cloud is on-demand New Model Storage admin provisions pools VI

    Admin consumes from pools via plug-ins VMAX uses hyper devices, and hypers are assembled into

    meta devices, VNX defaults are pooled devices NetApp has aggregates, pools of RAID-DP protected disks

    Both are basis for LUNs and FlexVols Engage your array vendor to move the

    storage team into the 21st century

  • What? VAAI isnt working.

    How do I know: Testing Storage VMotion/Cloning with no-offload versus Offload

    What do I do: Ensure the block storage initiators for the ESX host

    is configured ALUA on, also ensure the ESX server recognizes the change in the SATP look at IO bandwidth in vSphere client and storage array.

    Benefit tends to be higher when svmotion across SPs

    Google Virtual Geek VAAI Bad News

  • &

    My NFS based VM is impacted following a storage reboot or failover

    How do I know: VM freezes or, even worse, crashes

    What do I do: Check your ESX NFS timeout settings compare to

    TechBook recommendations (only needed if the datastore wasnt created using the plug-in)

    Review your VM and guest OS settings for resiliency. See TechBook for detailed procedure on VM resiliency

  • &

    THANK YOU

  • FILL OUT A SURVEY

    EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A

    $25 VMWARE COMPANY STORE GIFT CERTIFICATE

  • vSphere 5 Storage Best Practices

    Chad Sakac, EMC Corporation

    Vaughn Stewart, NetApp

    INF-STO2980

    #vmworldinf

    vSphere 5 StorageBest PracticesThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateThe Great Protocol DebateA packed agendaLeverage Key DocsKey VMware Resources & DocumentsKey Partner DocumentsSetup Multipathing RightUnderstanding the vSphere Pluggable Storage ArchitectureWhats out of the box in vSphere?What is Asymmetric Logical Unit (ALUA)?Understanding SAN MultipathingMultipathing with NFSv3Microsoft Cluster Service3rd Party Multi-Pathing Plugins (MPP)General NFS Best PracticesGeneral NFS Best Practices - TimeoutsGeneral NFS Best Practices - TimeoutsWhat has changed in vSphere 5?Path Management with Scale-Out ArraysiSCSI & NFS Ethernet Jumbo FramesIP Storage: Using iSCSI & NFS TogetherOptimize I/Oaka AlignmentAlignment is Optimal I/OAlignment is Optimal I/ODisk AlignmentAlignment is Optimal I/OFixing MisalignmentSlide Number 32Leverage Plug-insVAAI & VASAWhere Does Integration Happen? circa 2012Where Does Integration Happen? circa 2012New VAAI Stuff in vSphere 5.xVAAI NFS DemoSlide Number 38vCenter Plug-insFLEX Plugin DemoSlide Number 41Keep It SimpleKeep Storage SimpleSRDS and SIOCUse SRDS/SIOC if you canStorage DRS Operations IO ThresholdsStorage DRS Array use-case considerationsWhat to do when youre in trouble...My VM is not performing as expectedI see all these device events in vSphereDatastore capacity utilization is low/highMy storage team gives me tiny devicesWhat? VAAI isnt working.My NFS based VM is impacted following a storage reboot or failoverSlide Number 55Slide Number 56vSphere 5 StorageBest Practices