Helios: Heterogeneous Multiprocessing with Satellite Kernels

26
Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels 1

description

Helios: Heterogeneous Multiprocessing with Satellite Kernels. Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt Microsoft Research. Problem: HW now heterogeneous. Once upon a time…. Heterogeneity ignored by operating systems. RAM. RAM. RAM. RAM. CPU. CPU. CPU. - PowerPoint PPT Presentation

Transcript of Helios: Heterogeneous Multiprocessing with Satellite Kernels

Page 1: Helios: Heterogeneous Multiprocessing with Satellite Kernels

Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen

Hunt

MICROSOFT RESEARCH

Helios: Heterogeneous Multiprocessing with Satellite Kernels

1

Page 2: Helios: Heterogeneous Multiprocessing with Satellite Kernels

Problem: HW now heterogeneous

Heterogeneity ignored by operating systems

RAM

Programming models are fragmentedStandard OS abstractions are missing

2

CPU CPU

Once upon a time…

CPU

Hardware was homogeneous

CPU CPUCPUCPU

CPU CPUCPUCPU

RAM

CPU CPUCPUCPU

CPU CPUCPUCPU

GP-GPU

RAM

Programmable

NIC

RAM

Single CPU

SMPCMPNUMA

Page 3: Helios: Heterogeneous Multiprocessing with Satellite Kernels

3

SolutionHelios manages ‘distributed system in the

small’ Simplify app development, deployment, and tuning Provide single programming model for heterogeneous

systems

4 techniques to manage heterogeneity Satellite kernels: Same OS abstraction everywhere Remote message passing: Transparent IPC between

kernels Affinity: Easily express arbitrary placement policies to OS 2-phase compilation: Run apps on arbitrary devices

Page 4: Helios: Heterogeneous Multiprocessing with Satellite Kernels

4

ResultsHelios offloads processes with zero code

changes Entire networking stack Entire file system Arbitrary applications

Improve performance on NUMA architectures Eliminate resource contention with multiple kernels Eliminate remote memory accesses

Page 5: Helios: Heterogeneous Multiprocessing with Satellite Kernels

5

Outline

MotivationHelios design

Satellite kernels Remote message passing Affinity Encapsulating many ISAs

EvaluationConclusion

Page 6: Helios: Heterogeneous Multiprocessing with Satellite Kernels

Kernel

Programmable device

Driver interface is poor app interface

Hard to perform basic tasks: debugging, I/O, IPCDriver encompasses services and runtime…an OS!

6

CPU

I/O device

driver1010

AppApp

JIT Sched.Mem

.IPC

Page 7: Helios: Heterogeneous Multiprocessing with Satellite Kernels

7

Satellite kernels provide single interface

Sat. Kernel

CPU Programmable device

App

NUMA

App FSApp

Satellite kernels: Efficiently manage local resources Apps developed for single system call interface μkernel: Scheduler, memory manager, namespace manager

Sat. Kernel

TCP

NUMA

\\

Sat. Kernel

Page 8: Helios: Heterogeneous Multiprocessing with Satellite Kernels

8

Remote Message Passing

Local IPC uses zero-copy message passingRemote IPC transparently marshals data Unmodified apps work with multiple kernels

Sat. Kernel

Programmable device

App

NUMA

App FSApp

Sat. Kernel

TCP

NUMA

\\

Sat. Kernel

Page 9: Helios: Heterogeneous Multiprocessing with Satellite Kernels

9

Connecting processes and services

Applications register in a namespace as servicesNamespace is used to connect IPC channels

/fs/dev/nic0/dev/disk0/services/TCP/services/PNGEater/services/kernels/ARMv5

Satellite kernels register in namespace

Page 10: Helios: Heterogeneous Multiprocessing with Satellite Kernels

10

Where should a process execute?

Three constraints impact initial placement decision1. Heterogeneous ISAs makes migration is difficult2. Fast message passing may be expected3. Processes might prefer a particular platform

Helios exports an affinity metric to applications Affinity is expressed in application metadata and acts as a

hint Positive represents emphasis on communication – zero copy

IPC Negative represents desire for non-interference

Page 11: Helios: Heterogeneous Multiprocessing with Satellite Kernels

11

Affinity Expressed in Manifests

Affinity easily edited by dev, admin, or user

<?xml version=“1.0” encoding=“utf-8”?><application name=TcpTest” runtime=full> <endpoints> <inputPipe id=“0” affinity=“0” contractName=“PipeContract”/> <endpoint id=“2” affinity=“+10” contractName=“TcpContract”/> </endpoints></application>

Page 12: Helios: Heterogeneous Multiprocessing with Satellite Kernels

12

Platform Affinity

Platform affinity processed firstGuarantees certain performance characteristics

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/kernels/vector-CPUplatform affinity = +2

/services/kernels/x86platform affinity = +1

+2

+1 +1

Page 13: Helios: Heterogeneous Multiprocessing with Satellite Kernels

13

Positive Affinity

Represents ‘tight-coupling’ between processes Ensure fast message passing between processes

Positive affinities on each kernel summed

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/TCPcommunication affinity = +1

/services/PNGEatercommunication affinity = +2

/services/antiviruscommunication affinity = +3

X86NUMA

Programmable NIC

+1

+2+5

TCP

PNG A/V

Page 14: Helios: Heterogeneous Multiprocessing with Satellite Kernels

14

Negative Affinity

Expresses a preference for non-interference Used as a means of avoiding resource contention

Negative affinities on each kernel summed

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/kernels/x86platform affinity = +100

/services/antivirusnon-interference affinity = -1 X86

NUMA-1X86

NUMA A/V

Page 15: Helios: Heterogeneous Multiprocessing with Satellite Kernels

15

Self-Reference Affinity

Simple scale-out policy across available processors

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/webservernon-interference affinity = -1

W1-1 -1W2

W3

Page 16: Helios: Heterogeneous Multiprocessing with Satellite Kernels

16

Turning policies into actions

Priority based algorithm reduces candidate kernels by: First: Platform affinities Second: Other positive affinities Third: Negative affinities Fourth: CPU utilization

Attempt to balance simplicity and optimality

Page 17: Helios: Heterogeneous Multiprocessing with Satellite Kernels

17

Encapsulating many architectures

Two-phase compilation strategy All apps first compiled to MSIL At install-time, apps compiled down to available ISAs

MSIL encapsulates multiple versions of a method

Example: ARM and x86 versions of Interlocked.CompareExchange function

Page 18: Helios: Heterogeneous Multiprocessing with Satellite Kernels

18

Implementation

Based on Singularity operating system Added satellite kernels, remote message passing, and affinity

XScale programmable I/O card 2.0 GHz ARM processor, Gig E, 256 MB of DRAM Satellite kernel identical to x86 (except for ARM asm bits) Roughly 7x slower than comparable x86

NUMA support on 2-socket, dual-core AMD machine 2 GHz CPU, 1 GB RAM per domain Satellite kernel on each NUMA domain.

Page 19: Helios: Heterogeneous Multiprocessing with Satellite Kernels

19

Limitations

Satellite kernels require timer, interrupts, exceptions Balance device support with support for basic

abstractions GPUs headed in this direction (e.g., Intel Larrabee)

Only supports two platforms Need new compiler support for new platforms

Limited set of applications Create satellite kernels out of commodity system Access to more applications

Page 20: Helios: Heterogeneous Multiprocessing with Satellite Kernels

20

Outline

MotivationHelios design

Satellite kernels Remote message passing Affinity Encapsulating many ISAs

EvaluationConclusion

Page 21: Helios: Heterogeneous Multiprocessing with Satellite Kernels

21

Evaluation platformNUMA EvaluationXScale

NIC

Kernel

X86 X86

SatelliteKernelNI

CXScale

SatelliteKernel

X86NUMA

Single Kernel

X86NUMA

X86NUMA

SatelliteKernel

X86NUMA

SatelliteKernel

A B A B

Page 22: Helios: Heterogeneous Multiprocessing with Satellite Kernels

22

Offloading Singularity applications

Helios applications offloaded with very little effort

Name LOC LOC changed LOM changedNetworking stack

9600 0 1

FAT 32 FS 14200 0 1TCP test harness

300 5 1

Disk indexer 900 0 1Network driver 1700 0 0Mail server 2700 0 1Web server 1850 0 1

Page 23: Helios: Heterogeneous Multiprocessing with Satellite Kernels

23

Netstack offload

Offloading improves performance as cycles freed Affinity made it easy to experiment with offloading

PNG Size X86 Only uploads/sec

X86+Xscaleuploads/sec

Speedup % reduction in context switches

28 KB 161 171 6% 54%92 KB 55 61 12% 58%150 KB 35 38 10% 65%290 KB 19 21 10% 53%

Page 24: Helios: Heterogeneous Multiprocessing with Satellite Kernels

24

Email NUMA benchmark

Satellite kernels improve performance 39%

0

10

20

30

40

50

60

70

80

90

No Sat. Kernel Sat. Kernel

Emai

ls P

er S

econ

d

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

No Sat. Kernel Sat. KernelIn

stru

ctio

ns P

er C

ycle

(IP

C)

Page 25: Helios: Heterogeneous Multiprocessing with Satellite Kernels

25

Related Work

Hive [Chapin et. al. ‘95] Multiple kernels – single system image

Multikernel [Baumann et. Al. ’09] Focus on scale-out performance on large NUMA

architectures

Spine [Fiuczynski et. al.‘98] Hydra [Weinsberg et. al. ‘08] Custom run-time on programmable device

Page 26: Helios: Heterogeneous Multiprocessing with Satellite Kernels

26

ConclusionsHelios manages ‘distributed system in the small’

Simplify application development, deployment, tuning

4 techniques to manage heterogeneity Satellite kernels: Same OS abstraction everywhere Remote message passing: Transparent IPC between

kernels Affinity: Easily express arbitrary placement policies to OS 2-phase compilation: Run apps on arbitrary devices

Offloading applications with zero code changesHelios code release soon.