Todd Muirhead, VMware David Morse, VMware
VIRT1052BE
#VMworld #VIRT1052BE
Extreme Performance Series: Monster VM Database Performance
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
2#VIRT1052BE CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series: Monster VM Database Performance
Todd Muirhead , VMware David Morse, VMware
VMworld 2017 Content: Not fo
r publication or distri
bution
Outline
• Monster Database VMs
– Capability / Scalability / Performance
• Oracle and SQL Server
– Generational Performance (Westmere-EX to Skylake-SP)
– Performance Impact of Hyper-Threading, NUMA, VMFS vs RAW
• Best Practices
– Compute
– Memory
– Storage
– Network
• Key Takeaways
#VIRT1052BE CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
Monster Database VM Testing Overview
VMworld 2017 Content: Not fo
r publication or distri
bution
Sockets, Cores, Logical Processors
• Processor counts continue to increase with each generation
– Hyper-Threading doubles the number of Logical Processors, but doesn't double performance
• When sizing your VMs, CPU Cores is the most relevant value
– Hyper-Threading typically provides 15-20% more performance
#VIRT1052BE CONFIDENTIAL 6
VMworld 2017 Content: Not fo
r publication or distri
bution
– NUMA = Non-Uniform Memory Access
– Physical hosts are divided into NUMA nodes = 1 CPU and its local memory
– NUMA layout for atypical 4-socket Intel server:
– NUMA allows CPUs to access local memory faster than non-local (remote) memory
– If possible, right-size VMs (vCPU and vMem) to “fit” your underlying host architecture
• For Monster VMs, ESXi automatically creates virtual NUMA nodes for the VM and spans physical NUMA nodes
#VIRT1052BE CONFIDENTIAL 7
Example: for a host with 4 CPUs, 1 TB of RAM, each NUMA node = 1 CPU and 256 GB RAM
Know Your NUMARight-Size VMs to your Host Architecture
VMworld 2017 Content: Not fo
r publication or distri
bution
Monster VM Host Servers Intel 4-P Codename/Date
• Westmere-EX (2011)
• Ivy-Bridge-EX (2014)
• Haswell-EX (2015)
• Broadwell-EX (2016)
• Skylake-SP (2017)
#VIRT1052BE CONFIDENTIAL
8
VMworld 2017 Content: Not fo
r publication or distri
bution
High Speed Storage – IBM FlashSystem® A9000
#VIRT1052BE CONFIDENTIAL 9
• Highly Parallel Architecture
• Easy to Integrate with VMware vSphere
• Based on high performance IBM MicroLatency™ modules
• Inline deduplication and compression
• Easy to configure storage while still getting best performance possible
• Easily handled all Monster DB testing maintaining low latency for all IO of ~ 1 ms
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1052BE CONFIDENTIAL 10
Westmere-EX ServerXeon E7-4870
VMware vSphere 6.0
IvyBridge-EX ServerXeon E7-4890 v2
VMware vSphere 6.5
EthernetFibre Channel
Driver Server
2 socket
VMware vSphere 6.5
Testing Configuration
Haswell/Broadwell-EX
ServersXeon E7-8890 v3, v4
VMware vSphere 6.5
Skylake-SP ServerXeon Platinum 8180
VMware vSphere 6.5
Storage Array
IBM FlashSystem A9000
VMworld 2017 Content: Not fo
r publication or distri
bution
DVD Store 3
• Test workload was open source DVD Store 3 (github.com/dvdstore/ds3)
– OLTP workload simulating online store selling “DVDs”
– Utilizes many database features including stored procedures, transactions, triggers, foreign keys, and full-text indexes
– New version 3 includes customer reviews with intelligent review rankings
– Measured in terms of Orders Per Minute
– Each order is made up of a series of steps – login, browse for products, browse reviews, purchase products
– Supports Oracle, SQL Server, and MySQL
– Workload was run at increasing levels of load to find the highest performing test configuration
– New and more complex queries make results not comparable with previous DVD Store versions
#VIRT1052BE CONFIDENTIAL 11
VMworld 2017 Content: Not fo
r publication or distri
bution
Oracle Performance
VMworld 2017 Content: Not fo
r publication or distri
bution
Oracle Testing Overview
• Virtual Machine Configuration
– Oracle 12c
– Redhat Enterprise Linux 7.2
– 256GB RAM
– pvSCSI
– Vmxnet3
• Virtual CPU / VM counts
– Lots of different configurations based on hardware(Total Physical Processors / Cores / Threads)
#VIRT1052BE CONFIDENTIAL 13
VMworld 2017 Content: Not fo
r publication or distri
bution
Cool esxtop Screenshot
#VIRT1052BE CONFIDENTIAL 14
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtual Performance Grows with Hardware Performance Increases
#VIRT1052BU CONFIDENTIAL 15
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Westmere 40vCPU VM
Ivy-Bridge 60vCPU VM
Haswell 72 vCPUVM
Broadwell 96 vCPUVM
Skylake 112 vCPUVM
Re
lative
Ord
ers
Pe
r M
inu
te (
OP
M)
Generational Monster Oracle DB VM Performance
VMworld 2017 Content: Not fo
r publication or distri
bution
Hyper-Threading Performance gain of 11%
0
0.25
0.5
0.75
1
1.25
1.5
1 x 96 vCPU VM 2 x 96 vCPU VMs
Rela
tive
Ord
ers
Pe
r M
inu
te (
OP
M)
Performance Gain Using Hyper-Threads on 4-Socket Intel Broadwell Server
0.8
0.9
1.0
1.1
1.2
1 x 96 vCPU VM 2 x 96 vCPU VMs
Rela
tive O
rders
Per
Min
ute
(O
PM
)
Performance Gain Using Hyper-Threads on 4-Socket Intel Broadwell Server
#VIRT1052BU CONFIDENTIAL 16
VMworld 2017 Content: Not fo
r publication or distri
bution
40 vCPU VM on Broadwell-EX performs 50% better than on Westmere-EX* *with room left over for another 40vCPU VM
Increase in Capacity
0
10
20
30
40
50
60
70
80
90
100
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Westmere-EX 40vCPUs Broadwell-EX 40vCPUs
ES
Xi H
ost %
CP
U U
til
Ord
ers
Pe
r M
inu
te (
OP
M)
40vCPU VM Monster DB VM Performance
98% Host CPU Util
42% Host CPU Util
#VIRT1052BU CONFIDENTIAL 17
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server Performance
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server Testing Overview
• Virtual Machine Configuration
– SQL Server 2014, 2016
– Windows Server 2012 R2
– 256GB RAM
– pvSCSI
– Vmxnet3
• Lots of different configurations based on hardware(Total Physical Processors / Cores / Threads)
#VIRT1052BE CONFIDENTIAL 19
VMworld 2017 Content: Not fo
r publication or distri
bution
Generational Monster VM SQL Server Performance
#VIRT1052BE CONFIDENTIAL 20
Skylake-SP achieved over 4X performance (OPM) vs. Westmere-EX based system
38,482
70,869
95,216
123,552
161,287
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
Westmere-EX 4 x E7-487040 Physical Cores/vCPUs
IvyBridge-EX 4 x E7-4890 v260 Physical Cores/vCPUs
Haswell-EX 4 x E7-8890 v372 Physical Cores/vCPUs
Broadwell-EX 4 x E7-8890 v496 Physical Cores/vCPUs
Skylake SP 4 x Xeon Platinum8180
112 Physical Cores/vCPUs
Ord
ers
Per
Min
ute
(O
PM
)
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Per Watt
#VIRT1052BE CONFIDENTIAL 21
Performance is also more efficient with newer servers with higher OPM/watt
VMworld 2017 Content: Not fo
r publication or distri
bution
VMFS vs RDM on SQL Server 2016
#VIRT1052BE CONFIDENTIAL 22
Performance of VMFS and RDM were only about 1% different
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server on Linux
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server on Linux
• Microsoft is brining SQL Server to the Linux platform. Currently in technical preview.
• Worked with Microsoft on some tests to validate the performance on vSphere 6.5
• Test Workload – Cloud Database Benchmark – was provided by Microsoft
– Transactional workload, output measured in terms of Transactions Per Second (TPS)
– Test database size was 1.5TB
• Test system was Broadwell-EX based server with 96 cores (24 cores per socket).
• Virtual Machine was configured for the test:
– 512 GB of RAM
– RHEL 7.2
– Current build of SQL Server for Linux as of July 2017
#VIRT1052BE CONFIDENTIAL 24
VMworld 2017 Content: Not fo
r publication or distri
bution
Cool CDB / Matrix Screenshot
#VIRT1052BE CONFIDENTIAL 25
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server for Linux Monster VM Test Results
#VIRT1052BE CONFIDENTIAL 26
• Gain from 24 vCPU VM to a 96 vCPU VM was good with 3.2x scaling
0
2000
4000
6000
8000
10000
12000
14000
16000
24 vCPUs 48 vCPUs 72 vCPUs 96 vCPUs
Tra
nsa
ctio
ns P
er
Sce
on
d(T
PS
)
SQL Server for Linux Monster VM Scalingon vSphere 6.5 with CDB Workload
VMworld 2017 Content: Not fo
r publication or distri
bution
Monster Database VM Best Practices
VMworld 2017 Content: Not fo
r publication or distri
bution
CPU Best Practices
• First, size the CPU resources a VM needs – then look at NUMA node sizes
– Monster DB VMs that need to span NUMA nodes (aka “wide” VMs) perform great, so if you need more vCPUs than what one socket provides, increase it!
– If you don’t need that much CPU, then it can be more efficient to keep things within a NUMA node
• Run monster DB VMs on newest servers for best performance
• Do not pin VMs (CPU affinity)
• Enable Hyper-Threading (usually BIOS default)
• Leave Latency Sensitivity at default value of Normal
#VIRT1052BE CONFIDENTIAL 28
VMworld 2017 Content: Not fo
r publication or distri
bution
Memory Best Practices
• Use large memory pages
– Virtual databases benefit from large pages more than most applications due to memory usage patterns
• Set memory reservation
– Equal to Oracle SGA / SQL Server active memory
• Size VM memory based on NUMA node memory size
#VIRT1052BE CONFIDENTIAL 29
VMworld 2017 Content: Not fo
r publication or distri
bution
Storage Best Practices
• Configure storage carefully and properly
– Monster database VM is going to need dedicated storage LUNs
– Check to make sure the number of disks, type of disks, and RAID type are “right”
– Work with the storage administrator closely on storage configuration
• Use flash or SSDs if possible - IBM FlashSystem in our tests
• Virtual SCSI adapters should always be paravirtual SCSI (pvSCSI)
• If using iSCSI or NFS based storage, enable jumbo frames
#VIRT1052BE CONFIDENTIAL 30
VMworld 2017 Content: Not fo
r publication or distri
bution
Oracle Monster VM Best Practices
• Use Oracle Best Practices within the VM
– Use Oracle installer RPM to prep the VM with their best practices
– Evaluate the use of NUMA before enabling (Default is off)
– Use Linux Hugepages (sometimes called large pages)
• Enable at the OS level
• Oracle memory management parameters
• Enable virtual NUMA for the VM if it is wide
– Even though Oracle is not using NUMA, performance was still better with vNUMA than without
• Refer to Oracle Databases on VMware Best Practices Guide for more information
#VIRT1052BE CONFIDENTIAL 31
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server Monster VM Best Practices
• SQL Server is largely auto-tuning; newer releases better utilize NUMA
– “It Just Runs Faster – Apply SQL Server 2016 and SQL Server internally leverages SOFT NUMA partitioning to achieve double digit performance gains.”Source: https://blogs.msdn.microsoft.com/psssql/2016/03/30/sql-2016-it-just-runs-faster-automatic-soft-numa/
– However, some tuning is still advisable; here are a couple of bookmark-worthy resources:
• 5 SQL Server Settings to Change:https://www.brentozar.com/archive/2013/09/five-sql-server-settings-to-change/
• Microsoft SQL Server on VMware vSphere Best Practices Guide:http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf
#VIRT1052BE CONFIDENTIAL 32
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server Monster VM Best Practices
• Ensure your SQL Server license allows for the scale you need
– Web/Express/Standard have limitations vs. Developer/Enterprise Editions:
Source: https://www.microsoft.com/en-us/cloud-platform/sql-server-editions
#VIRT1052BE CONFIDENTIAL 33
VMworld 2017 Content: Not fo
r publication or distri
bution
SQL Server Monster VM Best Practices
• Right-sizing your monster SQL VMs is critical
– ESXi multi-CPU VMs are single-core by default (1 core/socket)
– Windows Server 2012 supports up to 64 sockets and 640 logical processors
• Therefore, you must increase cores per socket to go above 64 vCPU
– Example: For a 96-core physical host (4 sockets, 24 cores/socket), size the VM appropriately
#VIRT1052BE CONFIDENTIAL 34
VMworld 2017 Content: Not fo
r publication or distri
bution
Key Takeaways
• Upgrading to the latest hardware can provide big performance gains
• Low Latency of IBM FlashSystem enabled great performance
• Monster Database VMs are capable for all workloads
• Using best practices will ensure best performance
• Understanding how to monitor and troubleshoot performance
#VIRT1052BE CONFIDENTIAL 35
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series – Barcelona
• SER2724BE Performance Best Practices
• SER2343BE vSphere Compute & Memory Schedulers
• SER1504BE vCenter Performance Deep Dive
• SER2849BE Predictive DRS – Performance & Best Practices
• VIRT1445BE Fast Virtualized Hadoop and Spark on All-Flash Disks
• VIRT1397BE Optimize & Increase Performance Using VMware NSX
• VIRT1052BE Monster VM Database Performance
• FUT2020BE Wringing Max Perf from vSphere for Extremely Demanding Workloads
#VIRT1052BE CONFIDENTIAL 37
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series – Hand on Labs
Don’t miss these popular Extreme Performance labs:
• HOL-1804-01-SDC: vSphere 6.5 Performance Diagnostics & Benchmarking
– Each module dives deep into vSphere performance best practices, diagnostics, and optimizations using various interfaces and benchmarking tools
• HOL-1804-02-CHG: vSphere Challenge Lab
– Each module places you in a different fictional scenario to fix common vSphere operational and performance problems
#VIRT1052BE CONFIDENTIAL 38
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Survey
#VIRT1052BE CONFIDENTIAL 39
The VMware Performance Engineeringteam is always looking for feedback about your experience with theperformance of our products, ourvarious tools, interfaces and wherewe can improve.
Scan this QR code to access ashort survey and provide us directfeedback.
Alternatively: www.vmware.com/go/perf
Thank you!
VMworld 2017 Content: Not fo
r publication or distri
bution
Q & A
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
Top Related