Build Systems à la Carte...Build Systems à la Carte 1 October 2019, London Andrey Mokhov...
Transcript of Build Systems à la Carte...Build Systems à la Carte 1 October 2019, London Andrey Mokhov...
Build Systems à la Carte
1 October 2019, London
Andrey Mokhov @andreymokhov
Joint work with Neil Mitchell and Simon Peyton Jones
In the beginning…
The new world
Make
Ninja(originally Chromium browser)
Shake(originally Standard Chartered) CloudBuild
(Microsoft)
Bazel(Google)
Buck(Facebook)
Dune(Jane Street)
The new world
Make
Excel
Ninja(originally Chromium browser)
Shake(originally Standard Chartered) CloudBuild
(Microsoft)
Bazel(Google)
Buck(Facebook)
Nix(package management)
???Dune(Jane Street)
The questions we address
These are huge, complicated artefacts each embodying many,
many design choices, some essential and some incidental.
– Which one should I use? How do they compare with each other?
– What exactly does it mean for a build system to be “correct”?
– Can we combine a good property X from system A with good
property Y from system B? Or are X and Y somehow in conflict?
– What is a “Frankenbuild” and is my system susceptible to it?
– What unexplored variants exist?
What are the “good properties” that we want?
1. Minimality
Full rebuild Partial rebuild
Don’t repeat work unnecessarily
2. Early cutoff
Add a comment to main.c and rebuild
Stop when nothing changes
3. Cloud build
Shallow build Partial rebuild
Save repeating work by sharing
build results among all developers
4. Dynamic dependencies Dependencies of a
task depend on
the values of other
dependencies
4. Dynamic dependencies Dependencies of a
task depend on
the values of other
dependencies
Example in Excel
A B C
1 =INDIRECT("A" & C1)
2
A B C
1 10 10 1
2 20
Tasks
Values
Dynamic dependencies are super important
Dynamic dependencies are ubiquitous
– For example, #include files
Numerous workarounds
– GHC’s Make-based build system has multiple “build phases”
– Each phase: 1) Analyses files to generate dependencies
2) Adds those dependencies to the Makefile
3) Runs Make again
But endlessly painful, fragile, and doesn’t really work
Our conclusion: we really really want dynamic dependencies
#ifdef LINUX#include "linux.h"#endif
Making it concrete
Building small, but fully executable, build systems in Haskell
Vocabulary
Goal: bring up to date a store that maps keys to values
Typical build system Excel
Key (k) Name of a thing File name Cell address
Value (v) Value of the thing File contents Value of the cell
Store Maps a key to its value File system Grid
Task
(user specified)
How to compute the new
value of a key, given
values of its dependencies
Build rules Formulas
Dependencies
(of a task)
The keys whose values
must be known before
the task can complete
Build systems
type Build c i k v = Tasks c k v -> k -> Store i k v -> Store i k v
k Keys
v Values
i Information that the build system keeps
from one run to the next
c The effect structure that a Task can have,
typically c=Applicative or c=Monad
Goal: bring up to date a store that maps keys to values
Build systems
type Build c i k v = Tasks c k v -> k -> Store i k v -> Store i k v
type Tasks c k v = k -> Maybe (Task c k v)
k Keys
v Values
i Information that the build system keeps
from one run to the next
c The effect structure that a Task can have,
typically c=Applicative or c=Monad
Goal: bring up to date a store that maps keys to values
Nothing ⇒ this key is
an input
Tasks
main.exe: util.o main.ogcc util.o main.o -o main.exe
util.o: util.h util.cgcc -c util.c
main.o: util.h main.cgcc -c main.c
Tasks
main.exe: util.o main.ogcc util.o main.o -o main.exe
util.o: util.h util.cgcc -c util.c
main.o: util.h main.cgcc -c main.c
Input keys
A build task and the corresponding output key
d(main.o) = {util.h, main.c}
d(util.o) = {util.h, util.c}
d(main.exe) = {util.o, main.o}
Extracting static dependencies
main.exe: util.o main.ogcc util.o main.o -o main.exe
util.o: util.h util.cgcc -c util.c
main.o: util.h main.cgcc -c main.c
The Makefile language allows the users to specify tasks only with static
dependencies. As functional programmers say, it has Applicative structure.
Classifying tasks
What are interesting choices of c in Task c k v?
– Functor: tasks with exactly one static dependency (Docker)
– Applicative: tasks with static dependencies (Make)
– Selective: tasks with conditional static dependencies (Dune)
– Monad: tasks with dynamic dependencies (Shake)
– MonadPlus/MonadRandom: tasks with non-determinism,
for example A1 = RANDBETWEEN(1,3) in Excel
– MonadState i: tasks with access to persistent information
Schedulers and rebuilders
Our main insight: two axes
SchedulerFinds the order in which to execute tasks
RebuilderDecides whether to
execute a task, or to
re-use the previous
result
Make Excel
Ninja Shake
CloudBuild Bazel
Buck Nix
Three schedulers, four rebuilders
Schedulers
topological restarting suspending
Reb
uild
ers
Dirty bit Make Excel
Verifying traces Ninja Shake
Constructive traces CloudBuild Bazel
Deep constructive
traces Buck Nix
Three schedulers, four rebuilders
Schedulers
topological restarting suspending
Reb
uild
ers
Dirty bit Make Excel
Verifying traces Ninja Shake
Constructive traces CloudBuild Bazel
Deep constructive
traces Buck Nix
There are many, many
other build systems,
but they almost all
live here
Directly reflected in Haskell model
Schedulers
topological restarting suspending
Reb
uild
ers
Dirty bit modTimeRebuilder Make Excel
Verifying traces vtRebuilder Ninja Shake
Constructive traces ctRebuilder CloudBuild Bazel
Deep constructive
tracesdctRebuilder Buck Nix
type Scheduler c i ir k v = Rebuilder c ir k v -> Build c i k v
type Rebuilder c ir k v = k -> v -> Task c k v -> Task (MonadState ir) k v
make :: Ord k => Build Applicative (MakeInfo k) k vmake = topological modTimeRebuilder
What good propertiesdo we get?
Build Systems à la Carte
Topological Restarting Suspending
Dirty bit Make Excel
Verifying traces Ninja Shake
Constructive traces CloudBuild Bazel
Deep constructive
traces Buck Nix
Build Systems à la Carte
Topological Restarting Suspending
Dirty bit
No dynamicdependencies
Excel
Verifying traces Shake
Constructive traces Bazel
Deep constructive
traces Nix
Build Systems à la Carte
Topological Restarting Suspending
Dirty bit
No dynamicdependencies
Not
minimal
Verifying traces Shake
Constructive traces
Deep constructive
traces Nix
Build Systems à la Carte
Topological Restarting Suspending
Dirty bit
No dynamicdependencies
Not
minimal
No cloud
buildsVerifying traces
Constructive traces
Deep constructive
traces Nix
Build Systems à la Carte
Topological Restarting Suspending
Dirty bit
No dynamicdependencies
Not
minimal
No cloud
buildsVerifying traces
Constructive traces
Deep constructive
traces No cutoff
Build Systems à la Carte
Topological Restarting Suspending
Dirty bit
No dynamicdependencies
Not
minimal
No cloud
buildsVerifying traces
Constructive traces
Deep constructive
traces No cutoff
Build Systems à la Carte
Topological Restarting Suspending
Dirty bit Make Excel
Verifying traces Ninja Shake
Constructive traces CloudBuild Bazel Cloud Shake
Deep constructive
traces Buck Nix
Build Systems à la Carte
Schedulers
topological restarting suspending
Reb
uild
ers
Dirty bit modTimeRebuilder Make Excel
Verifying traces vtRebuilder Ninja Shake
Constructive traces ctRebuilder CloudBuild Bazel Cloud Shake
Deep constructive
tracesdctRebuilder Buck Nix
type Scheduler c i ir k v = Rebuilder c ir k v -> Build c i k v
type Rebuilder c ir k v = k -> v -> Task c k v -> Task (MonadState ir) k v
cloudShake :: (Ord k, Hashable v) => Build Monad (CT k v) k vcloudShake = suspending ctRebuilder
Yes, we can have simultaneously:
– Minimality
– Dynamic dependencies
– Early cutoff
– Cloud build
And it’s one
line of code :-)
It’s only a model!
– Our models are 20-30 lines for a build system
– Real build systems are millions of lines
– We don’t model– Parallelism
– Non-determinism
– Failure and recovery
– Networks and protocols
– Caching and eviction (of the cloud cache)
– Cleaning
– User interface
– etc.
Further reading and coding
Read the paper: “Build Systems à la Carte“, ICFP 2018 (29 pages)
– Introduction to how major build systems work
– Definitions of key properties and abstractions
– Store, Task, Scheduler, Rebuilder, Build, build system correctness,
verifying and constructive traces, Frankenbuilds, etc
– Executable build systems models and engineering aspects
Implementation: https://github.com/snowleopard/build
– Blog posts: Inside the paper (reflection on how we wrote this
paper) and The Task abstraction (what we can do with tasks)
Extra slides
Deep constructive traces
Instead of tracing immediate dependencies, trace terminal inputs
Downsides: cannot support early cutoff, can cause Frankenbuilds
Frankenbuilds
Initial buildClean up,
evict main.prof
Build main.prof,
then report.txt
Combination of task non-determinism and deep constructive traces
Often leads to segfaulting executables because of inconsistent linking
Tasks
newtype Task c k v = Task (f. c f => (k -> f v) -> f v)
k Keys
v Values
f The computational context in which the
task is run, e.g. f=IO
c The effects that a Task can have,
typically c=Applicative or c=Monad
Task: how to build a value when given a way to build its dependencies
Computed
value of the
task
Tasks
newtype Task c k v = Task (f. c f => (k -> f v) -> f v)
k Keys
v Values
f The computational context in which the
task is run, e.g. f=IO
c The effects that a Task can have,
typically c=Applicative or c=Monad
Task: how to build a value when given a way to build its dependencies
Computed
value of the
task
Callback: given a key
the task needs,
compute its value
Tasks
newtype Task c k v = Task (f. c f => (k -> f v) -> f v)
k Keys
v Values
f The computational context in which the
task is run, e.g. f=IO or f=State (Map k v)
The effects that a Task can have,
typically c=Applicative or c=Monad
Task: how to build a value when given a way to build its dependencies
Computed
value of the
task
Callback: given a key
the task needs,
compute its value
Tasks
newtype Task c k v = Task (f. c f => (k -> f v) -> f v)
k Keys
v Values
f The computational context in which the
task is run, e.g. f=IO or f=State (Map k v)
c The effects that a Task can have,
typically c=Applicative or c=Monad
Task: how to build a value when given a way to build its dependencies
Computed
value of the
task
Callback: given a key
the task needs,
compute its value
Task example
sprsh1 :: Tasks Applicative String Integersprsh1 "B1" = Just $ Task $
\fetch -> ((+) <$> fetch "A1" <*> fetch "A2")sprsh1 "B2" = Just $ Task $
\fetch -> ((*2) <$> fetch "B1")sprsh1 _ = Nothing
A B
1 10 =A1 + A2
2 20 =B1 * 2
newtype Task c k v = Task (f. c f => (k -> f v) -> f v)
type Tasks c k v = k -> Maybe (Task c k v)
Extracting static dependencies
newtype Const m a = Const { getConst :: m }
instance Monoid m => Applicative (Const m) wherepure _ = Const memptyConst x <*> Const y = Const (x <> y)
dependencies :: Task Applicative k v -> [k]dependencies (Task task) = getConst $ task (\k -> Const [k])
Const functor is defined in standard module Data.Functor.Const:
newtype Task c k v = Task (f. c f => (k -> f v) -> f v )
Build system example
busy :: Eq k => Build Monad () k vbusy tasks key store = execState (fetch key) storewhere
fetch :: k -> State (Store () k v) vfetch k = case tasks k of
Nothing -> get kJust (Task task) -> do v <- task fetch
put k vreturn v
type Build c i k v = Tasks c k v -> k -> Store i k v -> Store i k v
Build system example
busy :: Eq k => Build Monad () k vbusy tasks key store = execState (fetch key) storewhere
fetch :: k -> State (Store () k v) vfetch k = case tasks k of
Nothing -> get kJust (Task task) -> do v <- task fetch
put k vreturn v
newtype Task c k v = Task (f. c f => (k -> f v) -> f v)
type Tasks c k v = k -> Maybe (Task c k v)
Invoke task withf = State (Store () k v)
Build system example
busy :: Eq k => Build Monad () k vbusy tasks key store = execState (fetch key) storewhere
fetch :: k -> State (Store () k v) vfetch k = case tasks k of
Nothing -> get kJust (Task task) -> do v <- task fetch
put k vreturn v
get :: k -> State (Store () k v) vput :: k -> v -> State (Store () k v) ()
Build system example
busy :: Eq k => Build Monad () k vbusy tasks key store = execState (fetch key) storewhere
fetch :: k -> State (Store () k v) vfetch k = case tasks k of
Nothing -> get kJust (Task task) -> do v <- task fetch
put k vreturn v
get :: k -> State (Store () k v) vput :: k -> v -> State (Store () k v) ()
Not a very
good build
system…
Why is a Task so polymorphic?
A task must work for all f that satisfy the constraint c,
such as c = Applicative, or c = Monad
Main idea: use the same task in different ways
– Execute the task by using a stateful f,
such as f = IO (e.g. Make) or f = State (Map k v) (e.g. Excel)
– Extract accurate task dependencies by using a special f,
such as f = Const [k]
newtype Task c k v = Task (f. c f => (k -> f v) -> f v )
Scheduler example: topological
topological :: Ord k => Scheduler Applicative i i k vtopological rebuilder tasks target = execState $ mapM_ build order where
build :: k -> State (Store i k v) ()build key = case tasks key of
Nothing -> return ()Just task -> do
store <- getlet value = getValue key store
newTask :: Task (MonadState i) k vnewTask = rebuilder key value taskfetch :: k -> State i vfetch k = return (getValue k store)
newValue <- liftStore (run newTask fetch)modify $ putValue key newValue
order = topSort (reachable dep target)dep k = case tasks k of { Nothing -> []; Just task -> dependencies task }
The scheduler used by Make: build tasks in a topological order
Extract static
dependencies
Build tasks in a
linear order
precomputed
before the buildnewTask
skips up-to-
date keys
Rebuilder example: dirtyBitRebuilder
type Chain k = [k]type ExcelInfo k = (k -> Bool, Chain k)
dirtyBitRebuilder :: Rebuilder Monad (k -> Bool) k vdirtyBitRebuilder key value task = Task $ \fetch -> doisDirty <- getif isDirty key then run task fetch else return value
excel :: Ord k => Build Monad (ExcelInfo k) k vexcel = restarting dirtyBitRebuilder
The rebuilder used by Excel: rebuilds a task if it is marked dirty
Dirty bit