HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange...

43
HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh

Transcript of HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange...

Page 1: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HPMMAP: Lightweight Memory Management for

Commodity Operating Systems

Brian Kocoloski Jack LangeUniversity of Pittsburgh

Page 2: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Lightweight Experience in a Consolidated Environment• HPC applications need lightweight resource

management• Tightly-synchronized, massively parallel• Inconsistency a huge problem

Page 3: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Lightweight Experience in a Consolidated Environment• HPC applications need lightweight resource

management• Tightly-synchronized, massively parallel• Inconsistency a huge problem

• Problem: Modern HPC environments require commodity OS/R features• Cloud computing / Consolidation with general purpose• In-Situ visualization

Page 4: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Lightweight Experience in a Consolidated Environment• HPC applications need lightweight resource

management• Tightly-synchronized, massively parallel• Inconsistency a huge problem

• Problem: Modern HPC environments require commodity OS/R features• Cloud computing / Consolidation with general purpose• In-Situ visualization

• This talk: How can we provide lightweight memory management in a fullweight environment?

Page 5: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Lightweight vs Commodity Resource Management• Commodity management has fundamentally

different focus than lightweight management• Dynamic, fine-grained resource allocation• Resource utilization, fairness, security• Degrade applications fairly in response to heavy loads

Page 6: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Lightweight vs Commodity Resource Management• Commodity management has fundamentally

different focus than lightweight management• Dynamic, fine-grained resource allocation• Resource utilization, fairness, security• Degrade applications fairly in response to heavy loads

• Example: Memory Management• Demand paging• Serialized, coarse-grained address space operations

Page 7: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Lightweight vs Commodity Resource Management• Commodity management has fundamentally different

focus than lightweight management• Dynamic, fine-grained resource allocation• Resource utilization, fairness, security• Degrade applications fairly in response to heavy loads

• Example: Memory Management• Demand paging• Serialized, coarse-grained address space operations

• Serious HPC Implications• Resource efficiency vs. resource isolation• System overhead• Cannot fully support HPC features (e.g., large pages)

Page 8: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HPMMAP: High Performance Memory Mapping and Allocation Platform

• Independent and isolated memory management layers• Linux kernel module: NO kernel modifications• System call interception: NO application modifications• Lightweight memory management: NO page faults• Up to 50% performance improvement for HPC apps

Commodity Application HPC ApplicationModified System

Call InterfaceSystem Call

Interface Linux Kernel

Linux Memory Manager HPMMAP

Linux Memory HPMMAP Memory

NODE

Page 9: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Talk Roadmap

• Detailed Analysis of Linux Memory Management• Focus on demand paging architecture• Issues with prominent large page solutions

• Design and Implementation of HPMMAP• No kernel or application modification

• Single-Node Evaluation Illustrating HPMMAP Performance Benefits

• Multi-Node Evaluation Illustrating Scalability

Page 10: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Linux Memory Management• Default Linux: On-demand Paging• Primary goal: optimize memory utilization• Reduce overhead of common behavior (fork/exec)

• Optimized Linux: Large Pages• Transparent Huge Pages• HugeTLBfs• Both integrated with demand paging architecture

Page 11: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Linux Memory Management• Default Linux: On-demand Paging

• Primary goal: optimize memory utilization• Reduce overhead of common behavior (fork/exec)

• Optimized Linux: Large Pages• Transparent Huge Pages• HugeTLBfs• Both integrated with demand paging architecture

• Our work: determine implications of these features for HPC

Page 12: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge Pages

• Transparent Huge Pages (THP)• (1) Page fault handler uses large pages when possible• (2) khugepaged address space merging

Page 13: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge Pages

• Transparent Huge Pages (THP)• (1) Page fault handler uses large pages when possible• (2) khugepaged address space merging

• khugepaged• Background kernel thread• Periodically allocates and “merges” large page into

address space of any process requesting THP support• Requires global page table lock• Driven by OS heuristics – no knowledge of

application workload

Page 14: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge Pages

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• “Merge” – small page faults stalled by THP merge operation

Page 15: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge Pages

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• “Merge” – small page faults stalled by THP merge operation

• Large page overhead increased by nearly 100% with added load

Page 16: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge Pages

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• “Merge” – small page faults stalled by THP merge operation

• Large page overhead increased by nearly 100% with added load• Total number of merges increased by 50% with added load

Page 17: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge Pages

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• “Merge” – small page faults stalled by THP merge operation

• Large page overhead increased by nearly 100% with added load• Total number of merges increased by 50% with added load• Merge overhead increased by over 300% with added load

Page 18: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge Pages

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• “Merge” – small page faults stalled by THP merge operation

• Large page overhead increased by nearly 100% with added load• Total number of merges increased by 50% with added load• Merge overhead increased by over 300% with added load• Merge standard deviation increased by nearly 800% with

added load

Page 19: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge PagesNo Competition With Parallel Kernel Build

• Large page faults green, small faults delayed by merges blue• Generally periodic, but not synchronized

5M

Page

Fau

lt Cy

cles

Page

Fau

lt Cy

cles

0 0349 368App Runtime (s) App Runtime (s)

5M

Page 20: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Transparent Huge PagesNo Competition With Parallel Kernel Build

• Large page faults green, small faults delayed by merges blue• Generally periodic, but not synchronized• Variability increases dramatically under load

Page

Fau

lt Cy

cles

Page

Fau

lt Cy

cles

0 0349 368App Runtime (s) App Runtime (s)

5M 5M

Page 21: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfs

• HugeTLBfs• RAM-based filesystem supporting large page allocation• Requires pre-allocated memory pools reserved by

system administrator• Access generally managed through libhugetlbfs

Page 22: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfs

• HugeTLBfs• RAM-based filesystem supporting large page allocation• Requires pre-allocated memory pools reserved by

system administrator• Access generally managed through libhugetlbfs

• Limitations• Cannot back process stacks• Configuration challenges• Highly susceptible to overhead from system load

Page 23: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfs

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

Page 24: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfs

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• Large page fault performance generally unaffected by added load• Demonstrates effectiveness of pre-reserved memory pools

Page 25: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfs

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• Large page fault performance generally unaffected by added load• Demonstrates effectiveness of pre-reserved memory pools

• Small page fault overhead increases by nearly 475,000 cycles on average

Page 26: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfs

• Ran miniMD benchmark from Mantevo twice:• As only application• Co-located parallel kernel build

• Large page fault performance generally unaffected by added load• Demonstrates effectiveness of pre-reserved memory pools

• Small page fault overhead increases by nearly 475,000 cycles on average• Performance considerably more variable

• Standard deviation roughly 30x higher than the average!

Page 27: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfsHPCCG CoMD miniFE

Page

Fau

lt Cy

cles

App Runtime (s)

No Competition

With Parallel Kernel Build

10M

0

3M

0

3M

0

10M

0

3M

0

3M

0

51

60

248

281

54

59

Page 28: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HugeTLBfsHPCCG CoMD miniFE

Page

Fau

lt Cy

cles

App Runtime (s)

No Competition

With Parallel Kernel Build

10M

0

3M

0

3M

0

10M

0

3M

0

3M

0

51

60

248

281

54

59

• Overhead of small page faults increases substantially• Ample memory available via reserved memory pools, but

inaccessible for small faults• Illustrates configuration challenges

Page 29: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Linux Memory Management: HPC Implications• Conclusions of Linux Memory Management Analysis:

• Memory isolation insufficient for HPC when system is under significant load

• Large page solutions not fully HPC-compatible

• Demand Paging is not an HPC feature• Poses problems when adopting HPC features like large

pages• Both Linux large page solutions are impacted in different

ways

• Solution: HPMMAP

Page 30: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HPMMAP: High Performance Memory Mapping and Allocation Platform

• Independent and isolated memory management layers• Lightweight Memory Management• Large pages the default memory mapping unit• 0 page faults during application execution

Commodity Application HPC ApplicationModified System

Call InterfaceSystem Call

Interface Linux Kernel

Linux Memory Manager HPMMAP

Linux Memory HPMMAP Memory

NODE

Page 31: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Kitten Lightweight Kernel

• Lightweight Kernel from Sandia National Labs• Mostly Linux-compatible user environment• Open source, freely available

• Kitten Memory Management• Moves memory management as close to application as

possible• Virtual address regions (heap, stack, etc.) statically-sized and

mapped at process creation• Large pages default unit of memory mapping• No page fault handling

https://software.sandia.gov/trac/kitten

Page 32: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

HPMMAP Overview

• Lightweight versions of memory management system calls (brk, mmap, etc.)• “On-request”

memory management• 0 page faults during

application execution• Memory offlining

• Management of large (128 MB+) contiguous regions

• Utilizes vast unused address space on 64-bit systems• Linux has no knowledge of HPMMAP’d regions

Page 33: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Evaluation Methodology

• Consolidated Workloads• Evaluate HPC performance with co-located

commodity workloads (parallel kernel builds)• Evaluate THP, HugeTLBfs, and HPMMAP configurations

• Benchmarks selected from the Mantevo and Sequoia benchmark suites

• Goal: Limit hardware contention• Apply CPU and memory pinning for each workload

where possible

Page 34: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Single Node Evaluation

• Benchmarks• Mantevo (HPCCG, CoMD, miniMD, miniFE)• Run in weak-scaling mode

• AMD Opteron Node• Two 6-core NUMA sockets• 8GB RAM per socket

• Workloads:• Commodity profile A – 1 co-located kernel build• Commodity profile B – 2 co-located kernel builds

• Up to 4 cores over-committed

Page 35: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Single Node Evaluation - Commodity profile A

• Average 8-core improvement across applications of 15% over THP, 9% over HugeTLBfs

HPCCG CoMD

miniMD miniFE

Page 36: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Single Node Evaluation - Commodity profile A

• Average 8-core improvement across applications of 15% over THP, 9% over HugeTLBfs• THP becomes

increasingly variable with scale

HPCCG CoMD

miniMD miniFE

Page 37: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Single Node Evaluation - Commodity profile B

HPCCG CoMD

miniMD miniFE

• Average 8-core improvement across applications of 16% over THP, 36% over HugeTLBfs

Page 38: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Single Node Evaluation - Commodity profile B

HPCCG CoMD

miniMD miniFE

• Average 8-core improvement across applications of 16% over THP, 36% over HugeTLBfs• HugeTLBfs

degrades significantly in all cases at 8 cores – memory pressure due to weak-scaling configuration

Page 39: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Multi-Node Scaling Evaluation• Benchmarks

• Mantevo (HPCCG, miniFE) and Sequoia (LAMMPS)• Run in weak-scaling mode

• Eight-Node Sandia Test Cluster• Two 4-core NUMA sockets (Intel Xeon Cores)• 12GB RAM per socket• Gigabit Ethernet

• Workloads• Commodity profile C – 2 co-located kernel builds per

node• Up to 4 cores over-committed

Page 40: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Multi-Node Evaluation - Commodity profile C

HPCCG miniFE

LAMMPS

• 32 rank improvement: HPCCG - 11%, miniFE – 6%, LAMMPS – 4%

• HPMMAP shows very few outliers• miniFE: impact of single node variability

on scalability (3% improvement on single node• LAMMPS also beginning to show

divergence

Page 41: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Future Work

• Memory Management not the only barrier to HPC deployment in consolidated environments• Other system software overheads• OS noise

• Idea: Fully independent system software stacks• Lightweight virtualization (Palacios VMM)• Lightweight “co-kernel”

• We’ve built a system that can launch Kitten on a subset of offlined CPU cores, memory blocks, and PCI devices

Page 42: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Conclusion

• Commodity memory management strategies cannot isolate HPC workloads in consolidated environments• Page fault performance illustrates effects of contention• Large page solutions not fully HPC compatible

• HPMMAP• Independent and isolated lightweight memory manager• Requires no kernel modification or application modification

• HPC applications using HPMMAP achieve up to 50% better performance

Page 43: HPMMAP: Lightweight Memory Management for Commodity Operating Systems Brian Kocoloski Jack Lange University of Pittsburgh.

Thank You

• Brian Kocoloski• [email protected]• http://people.cs.pitt.edu/~briankoco

• Kitten• https://software.sandia.gov/trac/kitten