Luigi future
-
Upload
erik-bernhardsson -
Category
Technology
-
view
4.745 -
download
1
description
Transcript of Luigi future
July 29, 2014
Luigi
The past, the present, the future
Section name
Source:
The history2
The long story builder (2009-2010)XML madness Only used for one single project (my Master’s thesis)
3
The long story builder2 (2010-2011)Everything in Python, but insane amounts of boiler plate
4
Why luigi?
We wanted to do everything in Python, not XML
5
Source:
How do we use it at Spotify?
6
Blah
7
The things we got right
8
Section name
Everything is a directed acyclic graph
Makefile style Tasks specify what they are dependent on not what other things depend on them
9
Section name
Do everything in Python
Dependencies often involve algebra hard to express in XML
10
Section name
Centralized scheduler
Overview of everything that’s currently running/scheduled
11
Luigi worker 1 Luigi worker 2
A
B C
A C
F
Luigi central planner
Section name
Trigger jobs locally is trivial
If the only way is to run things remotely, debugging is super hard Running things locally makes it a lot easier No messing around with paths and configuration !(this has a flip side – more on this later)
12
Section name
It’s a library more than a framework
Avoid the “Hollywood principle” and make it easy to customize etc
13
The hairy parts…
14
Section name
Execution is tied to scheduling
You can’t run this task “in the cloud” and go away
15
Section name
Visualization is pretty rudimentary
See how nice Driven looks for instance: !
16
Section name
Scheduling isn’t tied to triggering
Need to rely on crontab etc Could borrow some of the nice parts of Chronos:
17
Section name
Source:
What are some ideas for the future?
18
Section name
Separate scheduling and execution
Schedule something to run later/somewhere else !Recent baby step towards this is a very simple fix for running modules dynamically: !$ luigi --module MyModule MyTask --foo xyz --bar 123!!The next step would be to do something like !$ luigi --module MyModule MyTask --foo xyz --bar 123 --execute-remotely !!A full implementation would include a bunch of command line options to probe status, kill tasks, etc
19
Section name
Separate scheduling and execution (2)
20
Luigi central scheduler
Worker
Worker
Worker
Worker
...
Section name
On-the-fly dependencies
class MyTask(luigi.Task):! def run(self):! input = yield OtherTask() # this could replace requires()
21
Section name
Built in crontab-replacement
@luigi.schedule!class MyTask(luigi.Task):! param = luigi.DateParameter(default=datetime.date.today())! def run(self):! …!!The @luigi.schedule decorator would then 1. Register that my_module.MyTask should be scheduled (by telling the central planner?) 2. Trigger it continuously from somewhere (central planner?)
22
Section name
ETA for tasks
Using a persistent task history database, you could train a simple k-NN classifier to predict how long a task will run
Then use this with the dependency graph to predict when any task will finish
23
More features in the central planner
Kill a task Re-launch a task Launch a new task
24
Section name
Support for other languages
Luigi is written in Python – but the RPC is language agnostic.
25
Happy plumbing!
26
Questions?
27