Luigi future

27
July 29, 2014 Luigi The past, the present, the future

description

Luig is a workflow manager in Python that I've open sourced. These are slides about Luigi's future from a meetup at July 31

Transcript of Luigi future

Page 1: Luigi future

July 29, 2014

Luigi

The past, the present, the future

Page 2: Luigi future

Section name

Source:

The history2

Page 3: Luigi future

The long story builder (2009-2010)XML madness Only used for one single project (my Master’s thesis)

3

Page 4: Luigi future

The long story builder2 (2010-2011)Everything in Python, but insane amounts of boiler plate

4

Page 5: Luigi future

Why luigi?

We wanted to do everything in Python, not XML

5

Page 6: Luigi future

Source:

How do we use it at Spotify?

6

Page 7: Luigi future

Blah

7

Page 8: Luigi future

The things we got right

8

Page 9: Luigi future

Section name

Everything is a directed acyclic graph

Makefile style Tasks specify what they are dependent on not what other things depend on them

9

Page 10: Luigi future

Section name

Do everything in Python

Dependencies often involve algebra hard to express in XML

10

Page 11: Luigi future

Section name

Centralized scheduler

Overview of everything that’s currently running/scheduled

11

Luigi worker 1 Luigi worker 2

A

B C

A C

F

Luigi central planner

Page 12: Luigi future

Section name

Trigger jobs locally is trivial

If the only way is to run things remotely, debugging is super hard Running things locally makes it a lot easier No messing around with paths and configuration !(this has a flip side – more on this later)

12

Page 13: Luigi future

Section name

It’s a library more than a framework

Avoid the “Hollywood principle” and make it easy to customize etc

13

Page 14: Luigi future

The hairy parts…

14

Page 15: Luigi future

Section name

Execution is tied to scheduling

You can’t run this task “in the cloud” and go away

15

Page 16: Luigi future

Section name

Visualization is pretty rudimentary

See how nice Driven looks for instance: !

16

Page 17: Luigi future

Section name

Scheduling isn’t tied to triggering

Need to rely on crontab etc Could borrow some of the nice parts of Chronos:

17

Page 18: Luigi future

Section name

Source:

What are some ideas for the future?

18

Page 19: Luigi future

Section name

Separate scheduling and execution

Schedule something to run later/somewhere else !Recent baby step towards this is a very simple fix for running modules dynamically: !$ luigi --module MyModule MyTask --foo xyz --bar 123!!The next step would be to do something like !$ luigi --module MyModule MyTask --foo xyz --bar 123 --execute-remotely !!A full implementation would include a bunch of command line options to probe status, kill tasks, etc

19

Page 20: Luigi future

Section name

Separate scheduling and execution (2)

20

Luigi central scheduler

Worker

Worker

Worker

Worker

...

Page 21: Luigi future

Section name

On-the-fly dependencies

class MyTask(luigi.Task):! def run(self):! input = yield OtherTask() # this could replace requires()

21

Page 22: Luigi future

Section name

Built in crontab-replacement

@luigi.schedule!class MyTask(luigi.Task):! param = luigi.DateParameter(default=datetime.date.today())! def run(self):! …!!The @luigi.schedule decorator would then 1. Register that my_module.MyTask should be scheduled (by telling the central planner?) 2. Trigger it continuously from somewhere (central planner?)

22

Page 23: Luigi future

Section name

ETA for tasks

Using a persistent task history database, you could train a simple k-NN classifier to predict how long a task will run

Then use this with the dependency graph to predict when any task will finish

23

Page 24: Luigi future

More features in the central planner

Kill a task Re-launch a task Launch a new task

24

Page 25: Luigi future

Section name

Support for other languages

Luigi is written in Python – but the RPC is language agnostic.

25

Page 26: Luigi future

Happy plumbing!

26

Page 27: Luigi future

Questions?

27