Build Systems à la Carte...Build Systems à la Carte 1 October 2019, London Andrey Mokhov...

Build Systems à la Carte

1 October 2019, London

Andrey Mokhov @andreymokhov

Joint work with Neil Mitchell and Simon Peyton Jones

https://twitter.com/andreymokhov

In the beginning…

The new world

Make

Ninja(originally Chromium browser)

Shake(originally Standard Chartered) CloudBuild

(Microsoft)

Bazel(Google)

Buck(Facebook)

Dune(Jane Street)

The new world

Make

Excel

Ninja(originally Chromium browser)

Shake(originally Standard Chartered) CloudBuild

(Microsoft)

Bazel(Google)

Buck(Facebook)

Nix(package management)

???Dune(Jane Street)

The questions we address

These are huge, complicated artefacts each embodying many,

many design choices, some essential and some incidental.

– Which one should I use? How do they compare with each other?

– What exactly does it mean for a build system to be “correct”?

– Can we combine a good property X from system A with good

property Y from system B? Or are X and Y somehow in conflict?

– What is a “Frankenbuild” and is my system susceptible to it?

– What unexplored variants exist?

What are the “good properties” that we want?

1. Minimality

Full rebuild Partial rebuild

Don’t repeat work unnecessarily

2. Early cutoff

Add a comment to main.c and rebuild

Stop when nothing changes

3. Cloud build

Shallow build Partial rebuild

Save repeating work by sharing

build results among all developers

4. Dynamic dependencies Dependencies of a

task depend on

the values of other

dependencies

Example in Excel

A B C

1 =INDIRECT("A" & C1)

2

A B C

1 10 10 1

2 20

Tasks

Values

Dynamic dependencies are super important

Dynamic dependencies are ubiquitous

– For example, #include files

Numerous workarounds

– GHC’s Make-based build system has multiple “build phases”

– Each phase: 1) Analyses files to generate dependencies

2) Adds those dependencies to the Makefile

3) Runs Make again

But endlessly painful, fragile, and doesn’t really work

Our conclusion: we really really want dynamic dependencies

#ifdef LINUX#include "linux.h"#endif

Making it concrete

Building small, but fully executable, build systems in Haskell

Vocabulary

Goal: bring up to date a store that maps keys to values

Typical build system Excel

Key (k) Name of a thing File name Cell address

Value (v) Value of the thing File contents Value of the cell

Store Maps a key to its value File system Grid

Task

(user specified)

How to compute the new

value of a key, given

values of its dependencies

Build rules Formulas

Dependencies

(of a task)

The keys whose values

must be known before

the task can complete

Build systems

type Build c i k v = Tasks c k v -> k -> Store i k v -> Store i k v

k Keys

v Values

i Information that the build system keeps

from one run to the next

c The effect structure that a Task can have,

typically c=Applicative or c=Monad


Build systems


type Tasks c k v = k -> Maybe (Task c k v)

k Keys

v Values

i Information that the build system keeps

from one run to the next

c The effect structure that a Task can have,



Nothing ⇒ this key is

an input

Tasks

main.exe: util.o main.ogcc util.o main.o -o main.exe

util.o: util.h util.cgcc -c util.c

main.o: util.h main.cgcc -c main.c

Tasks




Input keys

A build task and the corresponding output key

d(main.o) = {util.h, main.c}

d(util.o) = {util.h, util.c}

d(main.exe) = {util.o, main.o}

Extracting static dependencies




The Makefile language allows the users to specify tasks only with static

dependencies. As functional programmers say, it has Applicative structure.

Classifying tasks

What are interesting choices of c in Task c k v?

– Functor: tasks with exactly one static dependency (Docker)

– Applicative: tasks with static dependencies (Make)

– Selective: tasks with conditional static dependencies (Dune)

– Monad: tasks with dynamic dependencies (Shake)

– MonadPlus/MonadRandom: tasks with non-determinism,

for example A1 = RANDBETWEEN(1,3) in Excel

– MonadState i: tasks with access to persistent information

Schedulers and rebuilders

Our main insight: two axes

SchedulerFinds the order in which to execute tasks

RebuilderDecides whether to

execute a task, or to

re-use the previous

result

Make Excel

Ninja Shake

CloudBuild Bazel

Buck Nix

Three schedulers, four rebuilders

Schedulers

topological restarting suspending

Reb

uild

ers

Dirty bit Make Excel

Verifying traces Ninja Shake

Constructive traces CloudBuild Bazel

Deep constructive

traces Buck Nix

Three schedulers, four rebuilders

Schedulers


Reb

uild

ers




Deep constructive

traces Buck Nix

There are many, many

other build systems,

but they almost all

live here

Directly reflected in Haskell model

Schedulers


Reb

uild

ers

Dirty bit modTimeRebuilder Make Excel

Verifying traces vtRebuilder Ninja Shake

Constructive traces ctRebuilder CloudBuild Bazel

Deep constructive

tracesdctRebuilder Buck Nix

type Scheduler c i ir k v = Rebuilder c ir k v -> Build c i k v

type Rebuilder c ir k v = k -> v -> Task c k v -> Task (MonadState ir) k v

make :: Ord k => Build Applicative (MakeInfo k) k vmake = topological modTimeRebuilder

What good propertiesdo we get?


Topological Restarting Suspending




Deep constructive

traces Buck Nix



Dirty bit

No dynamicdependencies

Excel

Verifying traces Shake

Constructive traces Bazel

Deep constructive

traces Nix



Dirty bit


Not

minimal

Verifying traces Shake

Constructive traces

Deep constructive

traces Nix



Dirty bit


Not

minimal

No cloud

buildsVerifying traces

Constructive traces

Deep constructive

traces Nix



Dirty bit


Not

minimal

No cloud

buildsVerifying traces

Constructive traces

Deep constructive

traces No cutoff





Constructive traces CloudBuild Bazel Cloud Shake

Deep constructive

traces Buck Nix


Schedulers


Reb

uild

ers

Dirty bit modTimeRebuilder Make Excel

Verifying traces vtRebuilder Ninja Shake

Constructive traces ctRebuilder CloudBuild Bazel Cloud Shake

Deep constructive

tracesdctRebuilder Buck Nix

type Scheduler c i ir k v = Rebuilder c ir k v -> Build c i k v

type Rebuilder c ir k v = k -> v -> Task c k v -> Task (MonadState ir) k v

cloudShake :: (Ord k, Hashable v) => Build Monad (CT k v) k vcloudShake = suspending ctRebuilder

Yes, we can have simultaneously:

– Minimality

– Dynamic dependencies

– Early cutoff

– Cloud build

And it’s one

line of code :-)

It’s only a model!

– Our models are 20-30 lines for a build system

– Real build systems are millions of lines

– We don’t model– Parallelism

– Non-determinism

– Failure and recovery

– Networks and protocols

– Caching and eviction (of the cloud cache)

– Cleaning

– User interface

– etc.

Further reading and coding

Read the paper: “Build Systems à la Carte“, ICFP 2018 (29 pages)

– Introduction to how major build systems work

– Definitions of key properties and abstractions

– Store, Task, Scheduler, Rebuilder, Build, build system correctness,

verifying and constructive traces, Frankenbuilds, etc

– Executable build systems models and engineering aspects

Implementation: https://github.com/snowleopard/build

– Blog posts: Inside the paper (reflection on how we wrote this

paper) and The Task abstraction (what we can do with tasks)

https://github.com/snowleopard/build

https://neilmitchell.blogspot.com/2018/07/inside-paper-build-systems-la-carte.html

https://blogs.ncl.ac.uk/andreymokhov/the-task-abstraction/

Extra slides

Deep constructive traces

Instead of tracing immediate dependencies, trace terminal inputs

Downsides: cannot support early cutoff, can cause Frankenbuilds

Frankenbuilds

Initial buildClean up,

evict main.prof

Build main.prof,

then report.txt

Combination of task non-determinism and deep constructive traces

Often leads to segfaulting executables because of inconsistent linking

Tasks

newtype Task c k v = Task (f. c f => (k -> f v) -> f v)

k Keys

v Values

f The computational context in which the

task is run, e.g. f=IO

c The effects that a Task can have,


Task: how to build a value when given a way to build its dependencies

Computed

value of the

task

Tasks


k Keys

v Values


task is run, e.g. f=IO




Computed

value of the

task

Callback: given a key

the task needs,

compute its value

Tasks


k Keys

v Values


task is run, e.g. f=IO or f=State (Map k v)

The effects that a Task can have,



Computed

value of the

task


the task needs,

compute its value

Tasks


k Keys

v Values


task is run, e.g. f=IO or f=State (Map k v)




Computed

value of the

task


the task needs,

compute its value

Task example

sprsh1 :: Tasks Applicative String Integersprsh1 "B1" = Just $ Task $

\fetch -> ((+) <$> fetch "A1" <*> fetch "A2")sprsh1 "B2" = Just $ Task $

\fetch -> ((*2) <$> fetch "B1")sprsh1 _ = Nothing

A B

1 10 =A1 + A2

2 20 =B1 * 2



Extracting static dependencies

newtype Const m a = Const { getConst :: m }

instance Monoid m => Applicative (Const m) wherepure _ = Const memptyConst x <*> Const y = Const (x <> y)

dependencies :: Task Applicative k v -> [k]dependencies (Task task) = getConst $ task (\k -> Const [k])

Const functor is defined in standard module Data.Functor.Const:

newtype Task c k v = Task (f. c f => (k -> f v) -> f v )

Build system example

busy :: Eq k => Build Monad () k vbusy tasks key store = execState (fetch key) storewhere

fetch :: k -> State (Store () k v) vfetch k = case tasks k of

Nothing -> get kJust (Task task) -> do v <- task fetch

put k vreturn v






put k vreturn v



Invoke task withf = State (Store () k v)





put k vreturn v

get :: k -> State (Store () k v) vput :: k -> v -> State (Store () k v) ()





put k vreturn v

get :: k -> State (Store () k v) vput :: k -> v -> State (Store () k v) ()

Not a very

good build

system…

Why is a Task so polymorphic?

A task must work for all f that satisfy the constraint c,

such as c = Applicative, or c = Monad

Main idea: use the same task in different ways

– Execute the task by using a stateful f,

such as f = IO (e.g. Make) or f = State (Map k v) (e.g. Excel)

– Extract accurate task dependencies by using a special f,

such as f = Const [k]

newtype Task c k v = Task (f. c f => (k -> f v) -> f v )

Scheduler example: topological

topological :: Ord k => Scheduler Applicative i i k vtopological rebuilder tasks target = execState $ mapM_ build order where

build :: k -> State (Store i k v) ()build key = case tasks key of

Nothing -> return ()Just task -> do

store <- getlet value = getValue key store

newTask :: Task (MonadState i) k vnewTask = rebuilder key value taskfetch :: k -> State i vfetch k = return (getValue k store)

newValue <- liftStore (run newTask fetch)modify $ putValue key newValue

order = topSort (reachable dep target)dep k = case tasks k of { Nothing -> []; Just task -> dependencies task }

The scheduler used by Make: build tasks in a topological order

Extract static

dependencies

Build tasks in a

linear order

precomputed

before the buildnewTask

skips up-to-

date keys

Rebuilder example: dirtyBitRebuilder

type Chain k = [k]type ExcelInfo k = (k -> Bool, Chain k)

dirtyBitRebuilder :: Rebuilder Monad (k -> Bool) k vdirtyBitRebuilder key value task = Task $ \fetch -> doisDirty <- getif isDirty key then run task fetch else return value

excel :: Ord k => Build Monad (ExcelInfo k) k vexcel = restarting dirtyBitRebuilder

The rebuilder used by Excel: rebuilds a task if it is marked dirty

Dirty bit

Build Systems à la Carte...Build Systems à la Carte 1 October 2019, London Andrey Mokhov...

Documents

Transcript of Build Systems à la Carte...Build Systems à la Carte 1 October 2019, London Andrey Mokhov...