1 © Bull, 2013 BTSA Feb. 2013 February, 2013 Sizing & TCO for bullion.
-
Upload
alberto-towner -
Category
Documents
-
view
215 -
download
0
Transcript of 1 © Bull, 2013 BTSA Feb. 2013 February, 2013 Sizing & TCO for bullion.
1© Bull, 2013 BTSA Feb. 2013
February, 2013
Sizing & TCO for bullion
2© Bull, 2013 BTSA Feb. 2013
Agenda
Sizing: – Methodology– Bullion performance numbers– Consolidation (scale-out vs. scale-up)– Excel tool
TCO:– Examples– Excel tool
3© Bull, 2013 BTSA Feb. 2013
Input data
Inventory of physical servers and VMs to by replaced by bullions => enables to find the SPECint*rate performance Performance requirements (SPECint*rate, SAPS, …)Physical : number of cores, GHz, sockets, RAM sizeNumber of VMs and possible VM consolidationESXi over-commitment ratio for CPU and memory : usually from 1 to 5 according the performance expected for the applicationsIOs : bandwith for vMotion, for VMs, H.A.High Availability : number of nodes in the clusterDRS (2 sites) : yes or no; 1 or 2 clusters(synchronous/asynchronous)
4© Bull, 2013 BTSA Feb. 2013
bullion perf.(1) ratio within E7-4800 series
E7-48076 cores
1.86 GHz95WH.T
E7-48208 cores
2.00 GHz105WH.T
E7-48308 cores
2.13 GHz105WH.T
E7-88378 cores
2.66 GHz130W
No H.T
E7-485010 cores2.00 GHz
130WH.T
E7-8867L10 cores2.13 GHz
105WH.T
E7-486010 cores2.26 GHz
130WH.T
E7-487010 cores2.40 GHz
130WH.T
ratio perf (2)
E7-xxxx/E7-4870
0.461 0.672 0.727 0.741 0.853 0.887 0.940 1
perf 4 sock. bullion 517 753 814 829 955 994 1052 1120(3)
perf 8 sock. bullion 974 1419 1533 1563 1800(3) 1872 1983 2110(3)
perf 12 sock bullion 1430 2084 2253 2296 2645 2750 2913 3100(4)
perf 16 sock bullion 1896 2763 2987 3044 3506 3646 3862 4110(3)
ratio perf/pricebullion 4s 2.241 2.286 1.594 1.679 1.45 0.943 1.102 1
E7-4850 is the best ratio perf./price
in 10 cores
(1) SPECint*rate_base2006
(2) Intel reference
(3) Published on spec.org
(4) estimatedE7-4820 is the best
ratio perf/price in 8 cores
E7-4870 is the best perf
5© Bull, 2013 BTSA Feb. 2013
CPU perf. (native Linux SPECint®_rate2006 with E7-4870)
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
4500
1120
2110
4110
modules
SPECint*rate
Perfect linearityScalability ~x4 (x3.67)
6© Bull, 2013 BTSA Feb. 2013
SPECint*rate
Benchmark Hardware Vendor System Result Baseline # Cores # Chips
CINT2006rate Bull SAS bullion E7-4870 (160 cores - 4TB RAM) 4110 3890 160 16
CINT2006rate Hewlett-Packard Company ProLiant DL980 G7 (2.4 GHz, Intel Xeon E7-4870) 2180 2070 80 8
CINT2006rate Bull SAS bullion E7-4870 (80 cores - 2TB RAM) 2110 2000 80 8
CINT2006rate HITACHI BladeSymphony BS2000 (Intel Xeon E7-8870) 1920 1790 80 8
CINT2006rate HITACHI Compute Blade 2000 (Intel Xeon E7-8870) 1920 1790 80 8
CINT2006rate Unisys Corporation Unisys ES7000 Model 7600R G3 (Intel Xeon E7-8870) 1910 1780 80 8
CINT2006rate NEC Corporation Express5800/A1080a-E (Intel Xeon E7-8870) 1900 1790 80 8
CINT2006rate Fujitsu PRIMEQUEST 1800E2, Intel Xeon E7-8870, 2.40 GHz 1890 1770 80 8
CINT2006rate Fujitsu PRIMERGY RX900 S2, Intel Xeon E7-8870, 2.40 GHz 1890 1770 80 8
CINT2006rate IBM Corporation IBM System x 3850 X5 (Intel Xeon E7-8870) 0 1770 80 8
BCS or BCS-like
Glueless 8 sockets
7© Bull, 2013 BTSA Feb. 2013
Intel Xeon Processor E5 and E7 performance comparison
Intel Xeon E7-4800 series ideal for data-demanding application performanceIntel Xeon E5-4600 for HPC
Bullion 4-sockets
8© Bull, 2013 BTSA Feb. 2013
On-Line Transaction Processing (OLTP) perf.
bullion E7-4870 with VMware
tpmC(estimation)
tpsE(estimation)
4 sockets ~ 2,800,000 ~ 2,700
8 sockets ~ 5,500,000 ~ 4,600
12 sockets ~ 7,500,000 ~ 7,100
16 sockets ~ 10,000,000 ~ 9,500
9© Bull, 2013 BTSA Feb. 2013
Virtualization perf. : SPECvirt benchmark
Bullion
with VMwareSPECvirt_sc2010
4 sockets(X7560)
2721@168 (28 tiles) (1)
8 sockets(E7-4870)
8287@512 (85 tiles) (2)
12 sockets N/A (3)
16 sockets N/A (3)
(1) published in February 2011 with 512GB, 32 cores & ESXi 4.1
(2) estimation
(3) ESXi V5 is limited to 512 VMs and 160 logical CPUs
10© Bull, 2013 BTSA Feb. 2013
ERP performance
bullion
with VMwareSAPS
4 sockets(X7560)
41 420 (1)
8 sockets(E7-4870)
100 000 (2)
12 sockets(E7-4870)
175 000 (2)
16 sockets(E7-4870)
250 000 (2)
(1) published in may 2010 with 128 GB in a 2-tier SD architecture
(2) estimation in a 3-tier SD architecture
11© Bull, 2013 BTSA Feb. 2013
In each server : 2 VMs of 8 vCPUs => no vCPU left 1 VM of 16 vCPUs 0 VM of 32 vCPUs
In each server : 16 VMs of 8 vCPUs 8 VMs of 16 vCPUs => 32 vCPUs left 4 VMs of 32 vCPUs 2 VMs of 64 vCPUs
• VMs limited to 16 vCPUs• Load peaks => servers are 100% full • vMotion impossible
= 8 vms w/ vCPUs
20x 16-core servers => 20 ESXi 2x 160-core bullions => 2 ESXi
• No limitation on the VM size• Load peak: fully managed (without vMotion)• vMotion possible for big VMs
scale-out (2-socket x 8-core servers)
scale-up(16-socket x10-core bullions)
CPU load & VMs: comparison scale-out/scale-up
32 free vCPUs32 free vCPUs
same number of cores (320)
12© Bull, 2013 BTSA Feb. 2013
• 256 cores used• 512 cores paid
= 8 vCPUs
• 256 cores used• 320 cores paid
32 free vCPUs32 free vCPUs
HW investissement :
-37,5 %
32x 16-core servers => 32 ESXi 2x 160-core bullions => 2 ESXi
scale-out (2-socket x 8-core servers)
scale-up(16-socket x10-core bullions)
CPU load & VMs: comparison scale-out/scale-up
320
13© Bull, 2013 BTSA Feb. 2013
• 256 cores used• 512 cores paid
• 320 cores used• 400 cores paid
HW investment : -22 %Performance : +25%
HA VMware
32x 16-core servers => 32 ESXi 5x 80-core bullions => 5 ESXi
= 8 vCPUs
scale-out (2-socket x 8-core servers)
scale-up(8-socket x10-core bullions)
CPU load & VMs: comparison scale-out/scale-up
14© Bull, 2013 BTSA Feb. 2013
32 free vCPUs32 free vCPUs
Communication through the NICs
Communication internal to bullion => less Eth. adapters/cables/switches=> best performance
= 8 vCPUs
32x 16-core servers => 32 ESXi 2x 160-core bullions => 2 ESXi
scale-out (2-socket x 8-core servers)
scale-up(16-socket x10-core bullions)
CPU load & VMs: comparison scale-out/scale-up
15© Bull, 2013 BTSA Feb. 2013
VMs : size and quantity
In a 16 socket bullion you can theoretically fit up to 5 VMs with 32 vCPUs with one physical core available for each vCPU (160 cores) with best performance (no over-commitment)
On a 4 socket X7560 bullion (64 logical CPUs with H.T.), we could run 168 VMs with a CPU over-commitment of x2,6 and a good QoS (cf SpecVirt constraints):
28 tiles each one with 6 VMs (with 1 vCPU):– 1 DB server + 1 JAVA Application Servers + 1 mail server + 1 WEB server
+ 1 NFS server + 1 server in standby to measure the latency of the network latency (SPECpoll, 99,5% of request < 1 s)
Some consolidation projects allow to consolidate VMs inside the same cluster:
allows reduction of the necessary HW (CPU, RAM, IOs)
16© Bull, 2013 BTSA Feb. 2013
VDI sizing
For Citrix XenDesktop (used above ESXi hypervisor):1 VM per user 1 physical core for 8 VMs 1 GB of memory per VM
(no memory over commitment in order to avoid swapping)
More precisely, memory varies according the OS guest : from 512 MB for a Windows XP VM to 2 GB for Windows 7 VM
Example: for 1500 concurrent users => 190 cores (1500/8) & 1,5 TB memory
Configuration must be tuned to take into account the following:Considerations about load and HA (number of ESX)Hosting of other necessary VMs for XenApp (XenApp broker, ...) other CITRIX modulesConsolidation of other applicationsEtc.
17© Bull, 2013 BTSA Feb. 2013
CPU/memory load & High Availability
• use several bullions (ESXi) for your VMware cluster :
if one ESXi/bullion fails, VMware HA will restart the VMs on the
other bullions
• Minimum is 2 bullions (fail-safe / maintenance)
• For no perf degradation (no CPU/memory over-commitment*):– 50% average load for 2 bullions– 67% average load for 3 bullions <= best compromise– 75% average load for 4 bullions– 80% average load above
*max average load regardless of number of bullions s.b. up to 80%
18© Bull, 2013 BTSA Feb. 2013
CPU consolidation
Consolidating an existing park of small (1/2 sockets) physical servers
By default consider average CPU load to be no more than 15%Use Capacity Planner to obtain the exact number(e.g. XX => 7% CPU for 49 servers)
Consolidating an existing park of small (1/2 sockets) virtualized servers
By default consider the average CPU load to be 50%Use Capacity Planner to obtain the exact number
bullion proposition :bullion should be sized for an average load of up to 80%
19© Bull, 2013 BTSA Feb. 2013
Memory consolidation
• get the amount and load of memory of existing park to be consolidated
• % memory load is either given by an audit tool like Capacity Planner, or use 80% if you don’t know
• sizing rules for the memory in bullion are the same than CPU (50%-50%, 67%-67%-67%, max 80%)
20© Bull, 2013 BTSA Feb. 2013
bullion Inputs/Outputs sizing
I/O : check the capabilities of bullion :6 PCIe adapters / module : FC 4/8 Gbps, Ethernet 1/10 Gbps4 internal 1 GigE / moduleWARNING: check bullion limitations with multi-modules
Consider that VMs running in the same server (specially 16 sockets) allows to reduce the number of Ethernet adapters compared to smaller servers where VMs need to communicate out of the server
21© Bull, 2013 BTSA Feb. 2013
Sizing Ethernet communication
For applications with many IOs between VMs (e.g. Xerox dematerialisation application) :
=> you may decrease up to ~25% your global bullion configuration (compared to a small server)
For applications with not many IOS between VMs (e.g. VDI):=> decrease from -5% your global bullion configuration
22© Bull, 2013 BTSA Feb. 2013
IO configurations max for quadri-module bullion
Activated Kawela (1 GigE) 4 0 2 2 0 2
MegaRAID (disks) 0 0 0 0 0 1
LPE12002/1250 (FC) 7 7 4 7 7 7
I350-T2 (1 GigE) 0 0 0 0 0 0
i350-T4 (1 GigE) 0 4 0 0 0 0
X520-SR2/T2 * (10 gigE) 0 0 3 3 4 3
* X520-DA2 can be ordered through SFR
smaller configurations are possible by removing adapters
23© Bull, 2013 BTSA Feb. 2013
IO configurations max for tri-module bullion
Activated Kawela (1 GigE) 0 0 0
MegaRAID (disks) 0 0 1
LPE12002/1250 (FC) 6 6 5
I350-T2 (1 GigE) 0 2 2
i350-T4 (1 GigE) 4 0 0
X520-SR2/T2 * (10 GigE) 0 3 3* X520-DA2 can be ordered through SFR
smaller configurations are possible by removing adapters
24© Bull, 2013 BTSA Feb. 2013
Activated Kawela (1 GigE) 0 4 2 0 0 0 2 0* 2**
MegaRAID 0 0 0 0 1 1 1 1 0
LPE12002/1250 (FC) 4 4 4 4 4 3 0 0 2
i350-T2 (1 GigE) 0 0 0 0 2 2 0 0 0
i350-T4 (1 GigE) 4 2 0 0 0 0 0 0 0
X520-SR2/T2 (10 GigE) 0 0 2 4 0 3 2 4 3
IO configurations max for bi-module bullion
SFR only
* vSphere 5 maximum of 8x Eth 10 Gbps ports is respected
smaller configurations are possible by removing adapters
** vSphere 5 maximum of Eth combinated 6x 10 Gbps ports + 4x 1 Gbps ports is respected
25© Bull, 2013 BTSA Feb. 2013
Ethernet network example for a bi-module
- 2 links 10 Gb/s dedicated to vMotion (1 TB can be evacuated in ~20’) + admin VMware– huge bandwith for the VMs (6 links 10 Gb/s)– internal bandwith inter-modules very important (~300 Gb/s))– Hyper-Threading can be activated (perf. + 5-10%)
26© Bull, 2013 BTSA Feb. 2013
FC SAN example for a bi-module
- 4 HBAs (2 HBAs per module)- 4 boot paths
27© Bull, 2013 BTSA Feb. 2013
Bullion sizing calculator (excel file)
28© Bull, 2013 BTSA Feb. 2013
Sizing exercise
Propose an alternative solution with bullions to a DC with :- 20 blades UCS B200 M2 (2x 6-core CPU X5690) , 96 GB
mem/blade- SPECint*rate 1 blade = 432
29© Bull, 2013 BTSA Feb. 2013
Sizing exercise
- SPECint*rate 1 UCS B200 M2 (2x 6-core CPU X5690) = 432- 20 blades => 8 640 SPECint*rate- CPU load blade = 50% => 4 320 SPECint*rate
- 3 bi-module E7-4820 bullions provide :- With a 100% CPU load: 4256 SPECint*rate, i.e. 101% of the target- With a 2/3 CPU load: 2851 SPECint*rate, i.e. 68% of the target
- 4 bi-module E7-4820 bullions provide :- With a 100% CPU load: 5674 SPECint*rate, i.e. 134% of the target- With a 2/3 CPU load: 4256 SPECint*rate, i.e. 101% of the target
A good choice is to propose 4 bi-module E7-4820 bullions
30© Bull, 2013 BTSA Feb. 2013
Project example: target architecture
31© Bull, 2013 BTSA Feb. 2013
Comparison blades vs bullion
1 blade4 sockets E7-487040 cores/ 256 GB(32 DIMMs of 8 Go; max 48 DIMMs)
+ 1 châssis + Fabric Extender+ 1 switch Fabric Interconnect
7U + 2U1 553 watts
1 module bullion 4 sockets E7-4870 40 cores / 256 GB(32 DIMMs of 8 Go; max 64 DIMMs)
3U900 watts (-42%)
blades bullion
32© Bull, 2013 BTSA Feb. 2013
FermatGaloisGalois Fermat
Project example : initial proposal
Needs : 752 vCPUs => /5 = 150 cores 1504 GB vRAM => x 0,7 = 1052 Go
4 blades => 4 ESXi16 sockets (160 cores)1 024 GB
18 U5 256 watts
4 servers bullion => 4 ESXi16 sockets (160 cores)1 024 GB
12 U3 600 watts (-32%)
blades bullion
33© Bull, 2013 BTSA Feb. 2013
FermatGalois FermatGalois
Project example: 1st evolution
10 blades=> 10 ESXi36 sockets (360 cores)2 592 GB
32 U10 100 watts
vSphere licenses (Entreprise+):
36 sockets x $4152 = $149,500
4 servers bullion => 4 ESXi32 sockets (320 cores)2 592 GB
24 U7 200 watts (-29%)
vSphere licenses (Entreprise+):
32 sockets x $4152 = $132,885 (- 12%)
Needs: 1799 vCPUs => /5 = 360 cores vRAM 3 598 GB => x 0,7 = 2 518 GB
blades bullion
34© Bull, 2013 BTSA Feb. 2013
FermatGalois
Project example: 2nd evolution
16 blades => 16 ESXi 60 sockets (600 cores)
4 080 GB
32 U14 300 watts
vSphere licenses (Entreprise+):
58 sockets x $4152 = $240,239
4 servers bullion=> 4 ESXi 48 sockets (480 cores) 4 080 GB
36 U10 800 watts (-25%)
vSphere licenses (Entreprise+):
48 sockets x $4152 = $199,327 (-17%)
Needs: 2847 vCPUs => /5 = 570 cores vRAM 5 693 GB => x 0,7 = 3 985
GB +25% still available for future upgrade without adding servers
blades bullion
FermatGalois
Need to add 2 extra chassis (+14U ) in order to add more than 2 CPUs
Project example: 2nd evolution
+25% still available for future upgrade without adding servers
35© Bull, 2013 BTSA Feb. 2013
Example #2
Requirements (split in 2 datacenters for PRA):428 VMs (spread among 10 application domains)2 576 cores (/40 = 68,9)RAM 9 832 GB
Bullion scenarios:
Scénario MONO Total per DC Actual
Total modules without VM consolidation 77 38,5 39
Total modules 78Total RAM 9 832
RAM per serveur 126 128
Total cores 3 120
Scenario QUAD Total per DC Actual
Total modules with VM consolidation 68,9 34,45 35Total quad bullions per DC 18 9 9Total modules 72Total RAM 9 832 10 368RAM per serveur 546 576Total cores 2 880
Scenario QUAD optimized (consolidation -7% ) Total per DC ActualTotal modules 32 64Total servers 8 8
6 modules less
14 modules less
36© Bull, 2013 BTSA Feb. 2013
TCO calculation
TCO 3 years
UCS B200 BullionCapex
Total Hardware $329,250 $368,760
Hardware Installation $3424 $1712
VMware licenses $136,178 $37,718Opex
Hardware administration $61,635 $20,545
Hardware Maintenance subscription $19.913 $16,352
ESXi Admin/Maintenance $51,363 $5136
Power supply $214,287 $65,619
Space use $11,853 $11,853Total $827,904 $527,697
savings with bullion = $300,207 36%
- Quantity of HW to install/maintain
- Nb of licenses based on nb of sockets
- Power consumption
- Space in Data Center
- Nb of VMware nodes
37© Bull, 2013 BTSA Feb. 2013
TCO calculator bullion (excel file)
38© Bull, 2013 BTSA Feb. 2013
Summary
bullion : best performance & capacity (4110 SPECint*rate, 160 cores, 2 TB)=> ideal for consolidation
Consolidation:– HW (sockets, memory, IO adapters)– VMs
Tools:– Sizing tool (based on SPECint*rate and number of cluster nodes)– TCO (comparison against competition: OPEX, CAPEX 3 & 5 years)
39© Bull, 2013 BTSA Feb. 2013