Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads
description
Transcript of Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads
![Page 1: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/1.jpg)
Partition and Isolate:Approaches for Consolidating
HPC and Commodity Workloads
Jack LangeAssistant ProfessorUniversity of Pittsburgh
![Page 2: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/2.jpg)
Summary• Commodity and HPC systems have been converging
– Commodity off the shelf components– Linux based HPC systems– Cloud computing
• Problem: Real HPC applications need HPC environments– Tightly coupled, massively parallel, and synchronized– Current services must provide dedicated HPC systems
• Can we co-host HPC applications on commodity systems?
• Dual Stack Approach– Provision the underlying software stack along with application– Commodity stack should handle commodity applications– HPC stack can provide HPC environment
![Page 3: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/3.jpg)
• Current systems do support this, but…• Interference still exists inside the system software– Inherent feature of commodity systems
CoresSocket 1
Memory
1 2
3 4
CoresSocket 2
5 6
7 8
Memory
HPC PartitionCommodity Partition
User Space Partitioning
![Page 4: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/4.jpg)
HPC vs. Commodity Systems
• Commodity systems have fundamentally different focus than HPC systems– Amdahl’s vs. Gustafson’s laws– Commodity: Optimized for common case
• HPC: Common case is not good enough– At large (tightly coupled) scales, percentiles lose meaning– Collective operations must wait for slowest node– 1% of nodes can make 99% suffer– HPC systems must optimize outliers (worst case)
![Page 5: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/5.jpg)
5
High Performance Computing (HPC)
• Large scale simulations to solve Big Problems
![Page 6: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/6.jpg)
Dual Stack Approach
• Partition– Segment the underlying hardware resources– Assign them to exclusively to specific workloads
• Isolate– Prevent interference from other workloads– Hardware: partitions must be course grained– Software: eliminate shared state
• Implementation– Independent system software running on isolated resources
![Page 7: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/7.jpg)
HPC in the cloud• Clouds are starting to look like supercomputers…
– Are we seeing a convergence?• Not yet
– Noise issues– Poor isolation– Resource contention– Lack of control over topology
• Very bad for tightly coupled parallel apps– Require specialized environments that solve these problems
• Approaching convergence– Vision: Dynamically partition cloud resources into HPC and commodity zones– This talk: partitioning compute nodes with performance isolation
![Page 8: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/8.jpg)
Commodity VMMs
• Virtualization is considered an “enterprise” technology– Designed for commodity environments– Fundamentally different, but not wrong!
• Example: KVM architecture issues– Userspace handlers– Fairly complex memory management– Locking and periodic optimizations– Presence of system noise
![Page 9: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/9.jpg)
9
Palacios VMM• OS-independent embeddable virtual machine monitor
– Established compatibility with Linux, Kitten, and Minix
• Specifically targets HPC applications and environments– Consistent performance with very low variance
• Deployable on supercomputers, clusters (Infiniband/Ethernet), and servers– 0-3% overhead at large scales (thousands of nodes)
• VEE 2011, IPDPS 2010, ROSS 2011
http://www.v3vee.org/palaciosOpen source and freely available
![Page 10: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/10.jpg)
Palacios/Linux
• Palacios/Linux provides lightweight and high performance virtualized environments– Internally manages dedicated resources
• Memory and CPU scheduling– Does not bother with “enterprise features”
• Page sharing/merging, swapping, overcommitting resources
• Palacios enables scalable HPC performance on commodity platforms
![Page 11: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/11.jpg)
VMM Comparison
• Primary difference: Consistency– Requirement for tightly coupled performance at large scale
• Example: KVM nested paging architecture– Maintains free page caches to optimize performance
• Requires cache management– Shares page tables to optimize memory usage
• Requires synchronizationVMM % of exits Mean Std Dev # NPFS
KVM 52% 8804 5232 3,265,156
Palacios 50% 10876 2685 1,872,017
![Page 12: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/12.jpg)
Hardware
Palacios VMM
HPC Linux
Linux Kernel
HPC Application
Palacios ResourceManagers
KVM
Linux Module Interface
Commodity Linux
Commodity Application(s)
Dual Stack Architecture• Partitioning at the OS level
• Enable cloud to host both commodity and HPC apps– Each zone optimized for the target applications
![Page 13: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/13.jpg)
Evaluation• Goal: Measure VM isolation properties• Partitioned a single node into HPC and commodity zones
– Commodity Zone: Parallel Kernel compilation– HPC Zone: Set of standard HPC benchmarks– System:
• Dual 6-core AMD Opteron with NUMA topology • Linux guest environments (HPC and commodity)
• Important: Local node only– Does not promise good performance at scale– But, poor performance will magnify at large scales
![Page 14: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/14.jpg)
Results
Palacios delivers consistent performance
Commodity VMMs degrade with contention
MiniFE: Unstructured implicit finite element solverMantevo Project -- https://software.sandia.gov/mantevo/index.html
![Page 15: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/15.jpg)
Discussion• A dual stack approach can provide HPC environments on
commodity clouds– HPC and commodity workloads can dynamically share resources– HPC requirements can be met without fully dedicated resources
• Networking is still an open issue– Need mechanisms for isolation and partitioning– Need high performance networking architectures
• 1Gbit is not good enough• 10Gbit is good, Infiniband is better
– Need control over placement and topologies
![Page 16: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/16.jpg)
Multi-stack Operating Systems• Future Exascale Systems are moving towards in situ organization• Applications traditionally have utilized their own platforms• Visualization, storage, analysis, etc
• Everything must now collapse onto a single platform
![Page 17: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/17.jpg)
What this means for the OS• At Petascale we could optimize each environment separately
– Each had their own OS and hardware• At Exascale workloads will be co-located
– Can a single OS handle all workloads effectively?
• Probably Not– Each has different resource requirements and behaviors– Exascale will need to support multiple OS environments on the same
hardware
![Page 18: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/18.jpg)
18
Current Supercomputer OS architectures
Source: http://en.wikipedia.org/wiki/File:Operating_systems_used_on_top_500_supercomput
ers.svg
LINUX
UNIX
![Page 19: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/19.jpg)
19
Will Linux continue to dominate?
• An open question at this point– Exascale systems may mark a radical departure from traditional
architectures
• Kitten: Open-source Lightweight Kernel from Sandia– Minimal compute node OS
• Provides mostly Linux-compatible user environment– Supports unmodified compiler toolchains and ELF executables
• But doesn’t include the enterprise Linux features– Simple memory management, application managed I/O, etc
![Page 20: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/20.jpg)
Hardware
Palacios VMM
Kitten
Linux Kernel
HPC Application
Palacios ResourceManagers
Linux
Linux Module Interface
Commodity Application(s)
Dual Stack Architecture
• Provide LWK environment on a commodity system– Each zone optimized for the target applications
![Page 21: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/21.jpg)
21
Palacios/Kitten provides higher memory throughput than Linux– 400 MB/s (4.74%)
Palacios/Kitten provides more consistent memory performance than Linux– 340 MB/s lower standard deviation
Stream
![Page 22: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/22.jpg)
22
Selfish Detour
Linux Virtual KittenPalacios/Kitten can reduce noise from Linux• Eliminates Periodic timer interrupts
![Page 23: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/23.jpg)
Better than native?
• Results are preliminary– Followed best practices for configuring Linux – Didn’t try to optimize VMM performance
• Virtualization can improve system performance– when the system is running a commodity OS
B. Kocoloski, J. Lange, Better than Native: Using Virtualization to Improve Compute Node Performance, (ROSS 2012)
![Page 24: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/24.jpg)
Beyond Virtualization• Virtualization imposes overhead– Power: requires transistors– Performance: small, but present– Interference: Still some dependencies on host OS
• Might not be available on exascale hardware
• Can we provide native partitioning?– We think so– Linux provides the ability to dynamically remove resources
(CPUs, memory, etc)– These can be taken over by a second OS
![Page 25: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/25.jpg)
Hardware
Palacio VMMKitten
HPC Application
Linux
Commodity Application(s)
Dual Stack Architecture
• Provide LWK environment on a commodity system– Each zone optimized for the target applications
![Page 26: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/26.jpg)
Approach
• OS partition created via offlined resources– CPUs, memory, PCI devices
• Secondary OS “booted” on offline resources
• Issues:– OS initialization
• Boot process• Resource discovery
– Coordination and communication– Security and safety
![Page 27: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/27.jpg)
Dual Stack Memory
• Maybe we don’t need to provide an entirely separate OS– Instead selectively manage some resources
• Dual stack memory– Provide a separate memory management layer to Linux
• Features– Selectively manage heap per application– Provide applications with direct control over memory layout – Transparently back memory using large pages – Without overhead added by Linux
![Page 28: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/28.jpg)
Hardware
HPC Application
Linux
Commodity Application(s)
Dual Stack Architecture
• Provide LWK memory manager on a commodity OS
Memory Management
![Page 29: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/29.jpg)
Conclusion• Commodity systems are not designed to support HPC workloads
– Different requirements and behaviors than commodity applications
• A multi stack approach can provide HPC environments in commodity systems– HPC requirements can be met without separate physical systems– HPC and commodity workloads can dynamically share resources– Isolated system software environments
![Page 30: Partition and Isolate: Approaches for Consolidating HPC and Commodity Workloads](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681610f550346895dd069a3/html5/thumbnails/30.jpg)
Thank you
Jack LangeAssistant Professor University of Pittsburgh
http://www.cs.pitt.edu/~jacklange