Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart...

24
Transkernel: Bridging Monolithic Kernels to Peripheral Cores Liwei Guo , Shuang Zhai, Yi Qiao, and Felix Xiaozhu Lin Purdue ECE

Transcript of Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart...

Page 1: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Transkernel: Bridging Monolithic Kernels to Peripheral Cores

Liwei Guo, Shuang Zhai, Yi Qiao, and Felix Xiaozhu Lin

Purdue ECE

Page 2: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

What is Transkernel? ➔ A novel OS model to run unmodified binary of a monolithic

kernel

➔ on a microcontroller-like core …

➔ of a heterogeneous SoC

➔ Key techniques: dynamic binary translation + kernel service emulation

2

Page 3: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Motivation: Ephemeral tasks in smart thingsPrevalent: push notifications, periodic data logging, etc.

○ User tasks running on a monolithic kernel (e.g., Linux)

Energy-hungry: ~30% or higher battery drain in smart things [1]

Device suspend/resume is the key bottleneck [2]

3[1] Smartphone background activities in the wild: Origin, energy drain, and optimization, Chen et al., MobiCom’15[2] Decelerating Suspend and Resume, Zhai et al., Hotmobile’17

Ephemeraltask

Devicesuspend

Deviceresume

Wakeup Sleep

Time

Page 4: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Why is device suspend/resume so inefficient?Slow power state transitions keep CPU waiting

Difficult to parallelize due to device dependencies

4Understanding modern device drivers, Kadav et al., ASPLOS’12

Page 5: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Our proposal: suspend/resume on a peripheral core

5

● Benefit: Lower idle power and higher busy execution efficiency

● Asymmetric processors○ CPU + Peripheral core

● Heterogeneous, yet similar ISAs○ Same family, different profile ○ e.g., ARMv7a + v7m

● Loose coupling○ CPU can be turned on/off independently

● Shared platform resources○ IRQs○ DRAM

Apple A9 (Chipworks)

Page 6: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Many SoCs fit this hardware model

6

CPUPeripheral Core

Interconnect

DRAM IO Devices

OMAP4460Cortex M3 + A92010

AM572xCortex M4 + A15 2014

iPhone 6Cortex M3 2014

i.MX 7Cortex M4 + A72017

Azure SphereCortex M4 + A72019

Page 7: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Problem statement➔ On a heterogeneous SoC

➔ Given a commodity monolithic kernel (e.g., Linux)

➔ How to offload device suspend/resume kernel phase to the peripheral core?

7

CPU

Existing

CPU PeripheralCore

User Task

Device Resume

Device Suspend

Time The desired workflow

Page 8: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Peripheralkernel

Design space exploration: multikernelHowever…

8

Main kernel

Suspend Resume

Kernel State

CPU Peripheral Core

IOKernel State

Page 9: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Design space exploration: code transplantHowever…

9

Linux kernel

Suspend Resume

Peripheralkernel

Kernel State

CPU Peripheral Core

IO

Drivers

Driver lib

Kernel services & lib

Suspend Resume

Page 10: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Design space exploration: full virtual executionHowever…

10

Linux kernel

Translated Code

Kernel State

CPU Peripheral Core

IO

DBT

> 25x overhead with current DBT

Page 11: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Our proposal: Transkernel● Goal: Linux kernel offloading with affordable overhead● Approach: the peripheral core dynamically translates the kernel binary,

supported by a small set of emulated kernel services

11

CPU Peripheral Core

SuspendResume

CommodityKernel

Kernel StateDRAM

DynamicBinary Translation

IO devs

TranslatedcodeEmu

Page 12: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Transkernel in the design space

12

Linu

x ke

rnel

com

patib

ility

Barrelfish [SOSP’09]M3 [ASPLOS’16]

Execution overhead

QEMU [ATC’05]

Transkernel

K2 [ASPLOS’14]Popcorn [Eurosys’15]

Page 13: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Principle 1: translate stateful code, emulate stateless ● Stateful vs. stateless: whether

the states of the kernel are shared across cores

● Translated code: state-sharing made easy

● Emulated services: drop-in replacement

13

CPU Peripheral Core

kmalloc

Kernel StateDRAM

DynamicBinary Translation

IO devs

kmalloc

schedCommodity

Kernel

Page 14: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Principle 2: identify narrow trans/emulation interface● The interface has to be:

○ Narrow

○ Stable

● Maintenance of emulated services made easy

14

CPU Peripheral Core

Suspend/Resume

CommodityKernel

Kernel StateDRAM

DynamicBinary Translation

IO devs

Translated

emu

Page 15: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Principle 3: specialize for hot paths

● Hot paths: 99% of executions

○ Encounter no errors

○ All needed resources acquired

● Going off? Fall back to CPU

● Simplify DBT implementation on a peripheral core

15

Page 16: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Principle 4: exploit ISA similarity● Between the ISAs of CPU & peripheral core:

○ General purpose registers○ Control flow registers (SP, LR, PC)○ Flag semantics (NZCV)

● Reducing the number of emitted instructions in DBT ● Key to low overhead!

16

Page 17: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

ARK: an ARm transKernel

Platform: OMAP4460 (Cortex A9+M3)

ARK instantiates the principles on Linux● Execute unmodified Linux kernel

binary on the peripheral core● Depend on stable ABIs (only 12

functions + 1 variable)● Focus on hot paths; may fall back to

CPU● Low-overhead ARM v7a -> v7m DBT

17

sched spinlock

virtaddr

deferredwork

IRQhandler

IRQ handler(early)

mutexsem

memalloc

fallback

TranslatedCode(stateful)

delaysleep Emulation

(stateless)

Linuxkernel binary

Device-specific

Driver libs

AccessingLinuxkernel state

privatelib

Stable ABI

Kernel libs

DBTcontexts

DBT Engine

Page 18: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

ARK: the cross-ISA DBT engine● Systemize similar semantics of

ARM v7m & v7a from formal specification [1]

● Most instructions have identical semantics (447)

● Others instructions …○ Side effect○ Constant constraints○ Shift modes

● Our DBT engine correctly executes over 200 million instructions!

18

v7a insncount

Each translated to # of v7m insns

Identity 447 1Side effect 52 3-5Const constraints 22 2-5Shift modes 10 2No counterparts 27 2-5

Total (v7a) 558

[1] Trustworthy Specifications of ARM v8-A and v8-M System Level Architecture, Reid et al., FMCAD’16

Page 19: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

Evaluation

● Does ARK …a. Incur low-overhead?b. Incur tractable engineering efforts?c. Yield efficiency benefit?

● Benchmarks setup• Test the whole suspend/resume phase, driven by a userspace test harness

• Diverse drivers: SD card, Flash drive, MMC controller, USB controller, Regulator, Keyboard, Camera, Bluetooth NIC, Wi-Fi NIC

19

Page 20: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

ARK’s DBT achieves low execution overhead

20

05

10152025

SD Card

Flash

MMC-Ctrl

USB-Ctrl

Regula

tor KBCam BT

Wi-Fi

Suspend

Baseline ARK

05

101520

Resume

25x -> 2.7x

Ove

rhea

d (X

)

Page 21: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

ARK reuses Linux with low efforts● Good code reuse: 10K vs. 40K

and even more

● Good compatibility: multiple versions and configurations of Linux kernel

21

Existing code (unchanged)

Translated 15K SLoCSubstituted w/ emu 25K SLoC

New implementationDBT 9K SLoCEmulation 1K SLoC

Page 22: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

End-to-end execution time & energy ● Time: prolonged execution time● Energy: 34% energy saved● Interesting finding: ARK sees higher DRAM energy

22

0 1 2 3 4 5

Baseline

ARK

Native

Accumulated Time (s)Idle Busy

23

0 100 200 300 400

Baseline

ARK

Native

Energy (mJ)IO DRAM Core busy Core idle

681

Page 23: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

What-if analysis

23

ARK energy: 66% w/o optimizationenergy: 333%

(2.7x, 41%) (13.9x, 41%)

Page 24: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon

● Transkernel & its key techniques○ An appropriate translation/emulation boundary inside a monolithic kernel○ Exploit ISA similarity

● To OS ○ A new model to span a monolithic kernel over heterogeneous cores

● To DBT○ Efficiency loss can enable efficiency gain○ DBT applies to translate a specific path of a complex software stack!

● To Architects○ A heterogenous SoC friendly to transkernel

Take-home messages

24