CPU Optimizations in the CERN Cloud - February 2016
-
Upload
belmiro-moreira -
Category
Technology
-
view
467 -
download
2
Transcript of CPU Optimizations in the CERN Cloud - February 2016
![Page 1: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/1.jpg)
![Page 2: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/2.jpg)
CPU optimizations in the CERN Cloud Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016
Belmiro Moreira
[email protected] @belmiromoreira
Arne Wiebalck Tim Bell
Sean Crosby (Univ. of Melbourne) Ulrich Schwickerath
![Page 3: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/3.jpg)
What is CERN?
3
![Page 4: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/4.jpg)
CERN Cloud – LHC and Experiments
4
CMS detector
https://www.google.com/maps/streetview/#cern
![Page 5: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/5.jpg)
CERN Cloud – AMS
5
![Page 6: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/6.jpg)
OpenStack at CERN by numbers
6
~ 5500 Compute Nodes (~140k cores) • ~ 5300 KVM • ~ 200 Hyper-V
~ 2800 Images ( ~ 44 TB in use)
~ 2000 Volumes ( ~ 800 TB allocated) ~ 2200 Users ~ 2500 Projects
> 17000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
![Page 7: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/7.jpg)
The “20% overhead” problem • When running the batch system on top of the Cloud Infrastructure
we reach the limit of the total number of hosts in LSF
• On our batch full node VMs we noticed that the HS06 rating was ~20% lower than on the underlying host
• Smaller VMs behaved much better: ~8% (sum of simultaneous HS06 runs on 4x8core VMs on a 32core host)
7
![Page 8: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/8.jpg)
HS06 on virtual batch workers
8
HWDB HS06
VM Size (cores)
Per VM HS06 Total HS06 Overhead
357±16 4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16% 1x 32 284±11 284 20.4%
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
![Page 9: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/9.jpg)
Testing Optimizations – KSM off
9
• ATLAS T0 batch VMs show an IOwait of 20-30% • Compute nodes started to swap even when leaving 2 GB for
the OS
![Page 10: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/10.jpg)
Optimization by numbers – EPT off
10
HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead
357±16 4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16% 1x 32 284±11 284 20.4%
HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead Overhead
Reduction
357±16 4x 8 87±11 348 2.5% 68%
2x 16 163.5±1 327 8.4% 52% 1x 32 311±1 311 12.9% 37%
Before:
After:
![Page 11: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/11.jpg)
General virtualization issue? • Crosscheck w/ SLC6 VMs on Hyper-V
- 0.8% HS06 loss on 4x 8-core - 3.3% HS06 loss on 1x 32-core SLC6 VM
• No general virtualization overhead issue! - Rather a feature or configuration issue
• What’s the difference between the VMs on Hyper-V and KVM?
11
![Page 12: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/12.jpg)
NUMA • Hyper-V VMs have vCPUs pinned to
physical NUMA nodes
- Pinned to sets that correspond to physical NUMA nodes
• OpenStack wider support for this is available in Kilo
12
![Page 13: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/13.jpg)
NUMA - in the lab
… reduced the overhead to ~3% of the bare metal
13
![Page 14: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/14.jpg)
Deploying in production • EPT off; KSM on; NUMA-aware • System services add ~1-2% overhead • We got a total overhead of:
~5%
14
![Page 15: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/15.jpg)
and then Extremely slow nodes... • Small fraction of jobs 10x slower
- VMs look OK, actually pretty good - Hosts: 30-50% system load, >100k IRQ/s
(mostly TLB shoot-downs)
• Load attributed to qemu-kvm
- ‘perf top’: 90% in _raw_spin_lock - ‘systemtap’: paging64_page_fault
and kvm_mmu_pte* …
15
VM CPU utilization
Compute Node CPU utilization
![Page 16: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/16.jpg)
Back to the drawing board • Needed to combine optimizations with EPT on
• Huge pages a way out?
- Idea: reduce the number of pages to be handled, increase hit ratio
• 1GB huge pages
- Best HS06 results (with EPT on)
• 2MB huge pages
- Also one of the default sizes - Performance loss around 5% compared to bare metal on batch VMs
16
![Page 17: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/17.jpg)
Optimization by numbers
17
- NUMA + Pinning
- 2MB huge pages - EPT on - KSM on
VM sizes (cores) Before After
4x 8 7.8% 3.3%
2x 16 16% 4.6%
1x 32 20.4% 3-6%
![Page 18: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/18.jpg)
Deploy in production • A small fraction can cause a lot of trouble…
18
![Page 19: CPU Optimizations in the CERN Cloud - February 2016](https://reader030.fdocuments.in/reader030/viewer/2022021422/586fe8f51a28ab92198b496f/html5/thumbnails/19.jpg)
Summary • Reduced the virtualization HS06 overhead to a few
percent compared to bare metal - On full node VMs! - NUMA + pinning + huge pages + EPT on + KSM on
• Pre-deployment testing very difficult
- EPT off side-effects initially undetected
19