The dream is alive! Running Linux containers on an illumos kernel

of 18 /18
The dream is alive! Running Linux containers on an illumos kernel CTO [email protected] Bryan Cantrill @bcantrill

description

Presentation for #illumos day at #surgecon, 2014. Video can be found at https://www.youtube.com/watch?v=TrfD3pC0VSs Source code is at https://github.com/joyent/illumos-joyent

Transcript of The dream is alive! Running Linux containers on an illumos kernel

Page 1: The dream is alive! Running Linux containers on an illumos kernel

The dream is alive!Running Linux containers on an illumos kernel

CTO

[email protected]

Bryan Cantrill

@bcantrill

Page 2: The dream is alive! Running Linux containers on an illumos kernel

OS emulation: An old idea

• Operating systems have long employed system call emulation to allow binaries from one operating system run on another on the same instruction set architecture

• Combines the binary footprint of the emulated system with the operational advantages of the emulating system

• Sun first did this with SunOS 4.x binaries on Solaris 2.x

• With Solaris x86, it became possible to run binaries targeted for Linux via SCO’s (open source) “lxrun”

• Packaging innovation in Linux in early 2000s + deeply differentiated technologies in Solaris 10 (e.g. ZFS, DTrace, zones) made Linux emulation more attractive

Page 3: The dream is alive! Running Linux containers on an illumos kernel

Rise of zones

• While more important, the problem also became more complicated: programs became more complicated than single-process binaries

• Clear that “lxrun” would only work for applications, not systems — needed a deeper solution

• Fortunately, coincided with the rise of operating system virtualization embodied by zones

• Idea: introduce notion of a branded zone whereby an entire foreign system (a brand) could be emulated within the confines of a zone

Page 4: The dream is alive! Running Linux containers on an illumos kernel

BrandZ: LX-branded zones

• In 2006, team at Sun that included Nils Nieuwejaar and Russ Blaine integrated BrandZ, a Linux branded zone (PSARC 2005/471)

• Support was a user/kernel hybrid: lx system calls bounced back to a user-level emulation library that depended on some in-kernel emulation (e.g. futexes)

• Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4

• Remarkable amount of work was done to handle device pathing, signal handling, /proc — and arcana like TTY ioctls, ptrace, etc.

• Worked for a surprising number of binaries!

Page 5: The dream is alive! Running Linux containers on an illumos kernel

What was missing?

• Support was only for 2.4 kernels

• Support for 2.6 required adding new, Linux-only mechanisms that had native analogues (e.g., epoll)

• Only 32-bit was supported

• XVM (the Xen-on-Solaris effort inside of Sun) had much more managerial support and was thought to be a “more supportable” solution

Page 6: The dream is alive! Running Linux containers on an illumos kernel

The decline of the lx brand

After cresting in 2007, contributions to lx dwindled:

0

10

20

30

2006 2007 2008 2009 2010

Push

es t

o us

r/sr

c/lib

/bra

nd/lx

Page 7: The dream is alive! Running Linux containers on an illumos kernel

Clinically dead

The lx brand was removed on June 11, 2010...

0

10

20

30

2006 2007 2008 2009 2010 2011 2012 2013

Push

es t

o us

r/sr

c/lib

/bra

nd/lx

Page 8: The dream is alive! Running Linux containers on an illumos kernel

The organ donation years

• Joyent customers asked for SmartOS to support htop, a colorful Linux program for system process monitoring

• htop is very, very specific to Linux /proc — and porting it to use illumos /proc seemed arduous and pointless…

• ...but a relatively complete Linux /proc had integrated with the LX brand!

• In April 2012, the /proc portion of the LX brand was extracted, cleaned up, and separately integrated

• Mounted at /system/lxproc in SmartOS zones; htop modified to look for this path on illumos

Page 9: The dream is alive! Running Linux containers on an illumos kernel

Exhumed!

• In January 2014, David Mackay, an illumos community member, announced that he was able to resurrect the lx brand —and that it appeared to work!

Linked below is a webrev which restores LX branded zones support to Illumos:

http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/

I have been running OpenIndiana, using it daily on my workstation for over a month with the above webrev applied to the illumos-gate and built by myself.

It would definitely raise interest in Illumos. Indeed, I have seen many people who are extremely interested in LX zones.

The LX zones code is minimally invasive on Illumos itself, and is mostly segregated out.

I hope you find this of interest.

Page 10: The dream is alive! Running Linux containers on an illumos kernel

Could it be revived?

• David’s work inspired us to rethink LX-branded zones...

• It seemed that the reasons for the discontinuation of LX brand support might not still be valid...

• ...and it seemed that the engineering challenges might not be as structurally daunting

Page 11: The dream is alive! Running Linux containers on an illumos kernel

Has Linux made it easier?

• Linux is moving much more slowly: pace of development of new user-visible kernel abstraction has slowed

• Torvalds discovered religion on ABI compatibility

• The need to run on older kernels has dissuaded software from using the more obscure Linux-isms

• The glibc/kernel disconnect means that glibc (and apps!) must reasonably be able to process ENOSYS

• Easier support model: the rise of the cloud has replaced shrink-wrapped software with open source + SaaS

• Server focus: Mac OS X gave us Unix — and relegated “Linux on the desktop” to “Duke Nukem Forever” status

Page 12: The dream is alive! Running Linux containers on an illumos kernel

Have motivations changed?

• Originally, LX branded zones were about bringing Linux applications into established Solaris environments for purposes of hardware consolidation

• Port of KVM to illumos circa 2011 solved this problem

• ...but KVM has unresolvable performance and resource limitations, and Linux on KVM only gets indirect benefit from ZFS, DTrace and zones

• At the same time, enthusiasm for containers and OS-based virtualization have blossomed (ht: Docker)

• There seems to be desire for a best-of-all worlds system that combines Linux strengths (binary footprint) with illumos technical differentiators (ZFS, zones, DTrace)

Page 13: The dream is alive! Running Linux containers on an illumos kernel

Reviving LX-branded zones

• Encouraged that the body might not have decomposed, Joyent engineer Jerry Jelinek exhumed the LX brand and reintegrated it into SmartOS on March 20, 2014

• Guiding principles:

• Do it all in the open

• Do it all on SmartOS master (illumos-joyent)

• Add base illumos facilities wherever possible

• Aim to upstream to illumos when we’re done

• Thanks to Jerry grinding out many, many LX bug fixes, got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting in May and Ubuntu 14.04 booting in July

Page 14: The dream is alive! Running Linux containers on an illumos kernel

IT’S ALIVE!

Contributions to the lx brand since March:

0

25

50

75

100

2006 2007 2008 2009 2010 2011 2012 2013 2014

Push

es t

o us

r/sr

c/lib

/bra

nd/lx

Page 15: The dream is alive! Running Linux containers on an illumos kernel

So what have we done?

• Fixed a ton of bugs (ht: LTP)

• Added native epoll(5) — though not in terms of event ports but rather in terms of poll(7D)

• Added exclusive IP stacks for LX-branded zones

• Added support for netlink (RFC 3549) — but restricted that support to the lx brand

• Added support for thunk-less native binaries within an LX branded zone

• Added native inotify(5)

• Added initial 64-bit support

Page 16: The dream is alive! Running Linux containers on an illumos kernel

What is left to do?

• vsyscall support (needed for 64-bit)

• Anything else for 64-bit

• Stack switching (needed for Go)

• Multi-threaded ptrace support

• Lots of using it and figuring out what breaks!

Page 17: The dream is alive! Running Linux containers on an illumos kernel

How can you get involved?

• SmartOS contains latest-and-greatest bits; first step is to get SmartOS running

• We have a 32-bit Ubuntu 14.04 image that can be used to create a zone via vmadm:

b7493690-f019-4612-958b-bab5f844283e

• Will need to configure a VM with “kernel-version” set to 3.13.0 and “brand” to “lx” in the vmadm JSON payload

• If you find that something is boken, create an issue on the illumos-joyent github repo

• Once 64-bit is working, we will be very actively seeking community engagement; stay tuned!

Page 18: The dream is alive! Running Linux containers on an illumos kernel

Thanks!

• The original BrandZ team at Sun for a remarkable amount of work: Nils Nieuwejaar and Russ Blaine

• The illumos community — especially David Mackay! — for inspiring the revival

• Jerry Jelinek for leading the charge — and doing the vast majority of the work!

• @rmustacc for thunk-less native binary support

• @jmclulow for stack switching

• @djhoffma for his work on ptrace

• @joshwilsdon for vmadm support for LX brands