Succeeding in academia despite doing good_software

98
Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Varoquaux Ga¨ el Strong is the power of the dark side

description

Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Keynote I gave at the Scipyconf Argentina 2014 conference The advancement of science is a noble cause, and academia a fierce battlefield for tenure. Software is seen as a mere technicality, not worth a line on an academic CV. I claim that, on the opposite software, is the new medium of scientific method. I claim that succeeding in academia can be achieved not despite writing good software but via such an accomplishment. The key is to choose the right battles and to win them. What is the emerging role of software in the scientific workflow? Which are the software challenges that can have impact? How to balance software quality assurance and the quick turn-around random-walk of research? What does "good design" mean for research software? What Python patterns can boost productivity and reuse in exploratory scientific computing? I will try to answer these questions, based on my personal experience of growing up to become an academic Pythonista.

Transcript of Succeeding in academia despite doing good_software

Page 1: Succeeding in academia despite doing good_software

Hacking academia for fun and profitThoughts on succeeding in academia despite doing good software

VaroquauxGael

Strong is the powerof the dark side

Page 2: Succeeding in academia despite doing good_software

Hacking academia for fun and profitThoughts on succeeding in academia despite doing good software

VaroquauxGael

Strong is the powerof the dark side

with Python

Page 3: Succeeding in academia despite doing good_software

Hacking academia for fun and profitThoughts on succeeding in academia despite doing good software

VaroquauxGael

Strong is the powerof the dark side

despite goodsoftware?

Page 4: Succeeding in academia despite doing good_software

Publish or perish

Want a career?

G Varoquaux 2

Page 5: Succeeding in academia despite doing good_software

Publish or perish

Want a career?

Broken value system

Only Science matters

Wrong incentives

G Varoquaux 2

Page 6: Succeeding in academia despite doing good_software

Publish or perish

Want a career? Hack academia!

G Varoquaux 2

Page 7: Succeeding in academia despite doing good_software

Publish or perish

Want a career? Hack academia!

G Varoquaux 2

Page 8: Succeeding in academia despite doing good_software

Publish or perish

Want a career? Hack academia!

Publishing scientific software matters

[Pradal, Varoquaux, Langtangen,J Computational Science]

G Varoquaux 2

Page 9: Succeeding in academia despite doing good_software

[TL;DR]‹

Choose your battleskeeping science in the target

Win themsoftware production

‹ Too Long, Didn’t Read

G Varoquaux 3

Page 10: Succeeding in academia despite doing good_software

Growing up as a geek scientist

G Varoquaux 4

Page 11: Succeeding in academia despite doing good_software

Growing up as a geek scientist

I did a PhD inquantum physics

G Varoquaux 5

Page 12: Succeeding in academia despite doing good_software

Growing up as a geek scientist

I did a PhD inquantum physics

Vacuum (leaks)Electronics (shorts)Lasers (mis-alignment)

Best training everfor agile project

managementG Varoquaux 6

Page 13: Succeeding in academia despite doing good_software

Growing up as a geek scientist

I did a PhD inquantum physics

Vacuum (leaks)Electronics (shorts)Lasers (mis-alignment)

Computers were only oneof the many moving parts

MatlabInstrument control

Shaped my visionof computing as ameans to an end

G Varoquaux 7

Page 14: Succeeding in academia despite doing good_software

Growing up as a geek scientist

I did a PhD inquantum physics

Vacuum (leaks)Electronics (shorts)Lasers (mis-alignment)

Computers were only oneof the many moving parts

MatlabInstrument controlShaped my vision

of computing as ameans to an end

G Varoquaux 7

Page 15: Succeeding in academia despite doing good_software

Success

2011Tenured researcherin computer science

TodayGrowing team withdata sciencerock stars

How / why did I switch?Fernando Perez (IPython), PrabhuRamachandran (Mayavi), Eric Jones(Enthought), Travis Oliphant (Numpy)...

Learning fast is more impor-tant than knowing something

Bye bye physics

G Varoquaux 8

Page 16: Succeeding in academia despite doing good_software

Success

2011Tenured researcherin computer science

TodayGrowing team withdata sciencerock stars

How / why did I switch?Fernando Perez (IPython), PrabhuRamachandran (Mayavi), Eric Jones(Enthought), Travis Oliphant (Numpy)...

Learning fast is more impor-tant than knowing something

Bye bye physics

G Varoquaux 8

Page 17: Succeeding in academia despite doing good_software

Success

2011Tenured researcherin computer science

TodayGrowing team withdata sciencerock stars

How / why did I switch?Fernando Perez (IPython), PrabhuRamachandran (Mayavi), Eric Jones(Enthought), Travis Oliphant (Numpy)...

Learning fast is more impor-tant than knowing something

Bye bye physics

G Varoquaux 8

Page 18: Succeeding in academia despite doing good_software

Success

2011Tenured researcherin computer science

TodayGrowing team withdata sciencerock stars

How / why did I switch?Fernando Perez (IPython), PrabhuRamachandran (Mayavi), Eric Jones(Enthought), Travis Oliphant (Numpy)...

Learning fast is more impor-tant than knowing something

Bye bye physics

G Varoquaux 8

Page 19: Succeeding in academia despite doing good_software

And now...What I do nowadays:

Machine learning to understand brain function

Cognitive neuroscience:Link neural activity to thoughts and cognition

G Varoquaux 9

Page 20: Succeeding in academia despite doing good_software

And now...What I do nowadays:

Machine learning to understand brain function

Learn a bilateral linkbetween brain activity and cognitive function

G Varoquaux 9

Page 21: Succeeding in academia despite doing good_software

Software is the new medium of scientificmethod

Galileo’s notesG Varoquaux 10

Page 22: Succeeding in academia despite doing good_software

The scientific method

1. Make conjectures2. Derive prediction3. Carry experiments4. Confirm or infirm conjectures

G Varoquaux 11

Page 23: Succeeding in academia despite doing good_software

The scientific method

1. Make conjectures2. Derive prediction3. Carry experiments4. Confirm or infirm conjectures

Software is everywhereData-miningComputational modelsComputer-controled experimentsData analysis

G Varoquaux 11

Page 24: Succeeding in academia despite doing good_software

The scientific method

1. Make conjectures2. Derive prediction3. Carry experiments4. Confirm or infirm conjectures

Code is often the very language inwhich predictions are expressed

Models are now more complex than a simpleformula or sentence

G Varoquaux 11

Page 25: Succeeding in academia despite doing good_software

Enabling falsification: reproducible scienceReplicating

A 3rd party redoing the workCode and data made available

ReproducingNew analysis on different data / code coming to thesame conclusion

ReusingApplying the approach to a new problemLet us enable reusable research

Arguments for BSD licenseNo strings attachedCan tinker with it

G Varoquaux 12

Page 26: Succeeding in academia despite doing good_software

Enabling falsification: reproducible scienceReplicating

A 3rd party redoing the workCode and data made available

ReproducingNew analysis on different data / code coming to thesame conclusion

ReusingApplying the approach to a new problemLet us enable reusable research

Arguments for BSD licenseNo strings attachedCan tinker with it

G Varoquaux 12

Page 27: Succeeding in academia despite doing good_software

Reusable science ñ evidence accumulation

Accumulation of scientific knowledgeand learning formal representations

Akin to a review paper of the fieldBut a mathematical model is more testable

“A theory is a good theory if it satisfies two requirements:It must accurately describe a large class of observa-tions on the basis of a model that contains only a fewarbitrary elements, and it must make definite predic-tions about the results of future observations.”

Stephen Hawking, A Brief History of Time.

G Varoquaux 13

Page 28: Succeeding in academia despite doing good_software

Reusable science ñ evidence accumulation

Accumulation of scientific knowledgeand learning formal representations

Akin to a review paper of the fieldBut a mathematical model is more testable

Machine learning:engineering knowledge from data

“A theory is a good theory if it satisfies two requirements:It must accurately describe a large class of observa-tions on the basis of a model that contains only a fewarbitrary elements, and it must make definite predic-tions about the results of future observations.”

Stephen Hawking, A Brief History of Time.

G Varoquaux 13

Page 29: Succeeding in academia despite doing good_software

Reusable science ñ evidence accumulation

Accumulation of scientific knowledgeand learning formal representations

Akin to a review paper of the fieldBut a mathematical model is more testable

“A theory is a good theory if it satisfies two requirements:It must accurately describe a large class of observa-tions on the basis of a model that contains only a fewarbitrary elements, and it must make definite predic-tions about the results of future observations.”

Stephen Hawking, A Brief History of Time.

G Varoquaux 13

Page 30: Succeeding in academia despite doing good_software

The sweet spots across science and software

G Varoquaux 14

Page 31: Succeeding in academia despite doing good_software

The advancement of knowledgeImagine a circle that contains human knowledge

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 32: Succeeding in academia despite doing good_software

The advancement of knowledgeBy the time you finish elementary school, you know a little

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 33: Succeeding in academia despite doing good_software

The advancement of knowledgeHigh school takes you a little bit further

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 34: Succeeding in academia despite doing good_software

The advancement of knowledgeWith a bachelors degree, you gain a speciality

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 35: Succeeding in academia despite doing good_software

The advancement of knowledgeA master’s degree deepens this speciality

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 36: Succeeding in academia despite doing good_software

The advancement of knowledgeResearch papers take you to the edge of human knowledge

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 37: Succeeding in academia despite doing good_software

The advancement of knowledgeOnce you are at the boundary, you focus

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 38: Succeeding in academia despite doing good_software

The advancement of knowledgeYou push at the boundary for a few years

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 39: Succeeding in academia despite doing good_software

The advancement of knowledgeAnd one day it yields

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 40: Succeeding in academia despite doing good_software

The advancement of knowledgeThat dent you’ve made, is called a PhD

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 41: Succeeding in academia despite doing good_software

The advancement of knowledgeOf course, the world looks different to you now

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 42: Succeeding in academia despite doing good_software

The advancement of knowledgeBut don’t forget the big picture

PhD

Courtesy of Matt Might, via Stefan van der WaaltG Varoquaux 15

Page 43: Succeeding in academia despite doing good_software

The advancement of knowledgeThis is an optimistic view

Biology

Maths

Computer sciencePhysics

Economy

LiteratureHistory

G Varoquaux 15

Page 44: Succeeding in academia despite doing good_software

The advancement of knowledgeThis is an optimistic view

Biology

Maths

Computer sciencePhysics

Economy

LiteratureHistory

I want tobe there

G Varoquaux 15

Page 45: Succeeding in academia despite doing good_software

Translationnal computional scienceComputational science

The use of computers and mathematical models toaddress scientific research

Translationnal scienceIn medecine: bring bench science to medical practice

Translationalcomputational science?

G Varoquaux 16

Page 46: Succeeding in academia despite doing good_software

Translationnal computional scienceComputational science

The use of computers and mathematical models toaddress scientific research

Translationnal scienceIn medecine: bring bench science to medical practice

Translationalcomputational science?

G Varoquaux 16

Page 47: Succeeding in academia despite doing good_software

Translationnal computional scienceComputational science

The use of computers and mathematical models toaddress scientific research

Translationnal scienceIn medecine: bring bench science to medical practice

Translationalcomputational science?

G Varoquaux 16

Page 48: Succeeding in academia despite doing good_software

Pick a problem to work onTake the “easy” route

There needs to be a market screeming for thesoftware (in academia and in industry)

Refine your vision

Pull, not pushDesign driven be need

G Varoquaux 17

Page 49: Succeeding in academia despite doing good_software

Having an impact

G Varoquaux 18

Page 50: Succeeding in academia despite doing good_software

Having an impact

G Varoquaux 18

Page 51: Succeeding in academia despite doing good_software

Pick the right battles: viable projectsProject idea

A software implementing:i) machine learning

and ii) neuroimagingand iii) a graphical user interfaceand iv) 3D plotting

Define project scope and visionBreak down projects by expertiseDon’t solve hard problemsKnow the software landscapeDon’t target markets that will notyield contributors

Need a vision = elevator pitch

Your research (PhD) probably does not qualifyñ need to cherry-pick contributions

G Varoquaux 19

Page 52: Succeeding in academia despite doing good_software

Pick the right battles: viable projectsProject idea

A software implementing:i) machine learning

and ii) neuroimagingand iii) a graphical user interfaceand iv) 3D plotting

Define project scope and visionBreak down projects by expertiseDon’t solve hard problemsKnow the software landscapeDon’t target markets that will notyield contributors

Need a vision = elevator pitch

Your research (PhD) probably does not qualifyñ need to cherry-pick contributions

G Varoquaux 19

Page 53: Succeeding in academia despite doing good_software

Pick the right battles: viable projectsProject idea

A software implementing:i) machine learning

and ii) neuroimagingand iii) a graphical user interfaceand iv) 3D plotting

Define project scope and visionBreak down projects by expertiseDon’t solve hard problemsKnow the software landscapeDon’t target markets that will notyield contributors

Need a vision = elevator pitch

Your research (PhD) probably does not qualifyñ need to cherry-pick contributions

G Varoquaux 19

Page 54: Succeeding in academia despite doing good_software

Pick the right battles: viable projectsProject idea

A software implementing:i) machine learning

and ii) neuroimagingand iii) a graphical user interfaceand iv) 3D plotting

Define project scope and visionBreak down projects by expertiseDon’t solve hard problemsKnow the software landscapeDon’t target markets that will notyield contributors

Need a vision = elevator pitch

Your research (PhD) probably does not qualifyñ need to cherry-pick contributions

G Varoquaux 19

Page 55: Succeeding in academia despite doing good_software

Open source and community developmentCode maintenance too expensive to be alone

scikit-learn „ 300 email/month nipy „ 45 email/monthjoblib „ 45 email/month mayavi „ 30 email/month

“Hey Gael, I take it you’re toobusy. That’s okay, I spent a daytrying to install XXX and I thinkI’ll succeed myself. Next timethough please don’t ignore myemails, I really don’t like it. Youcan say, ‘sorry, I have no time tohelp you.’ Just don’t ignore.”

Your “benefits” come from a fraction of the codeData loading? Maybe?Standard algorithms? Nah

Share the common code......to avoid dying under code

Code becomes less precious with timeAnd somebody might contribute features

G Varoquaux 20

Page 56: Succeeding in academia despite doing good_software

Open source and community developmentCode maintenance too expensive to be alone

scikit-learn „ 300 email/month nipy „ 45 email/monthjoblib „ 45 email/month mayavi „ 30 email/month

Your “benefits” come from a fraction of the codeData loading? Maybe?Standard algorithms? Nah

Share the common code......to avoid dying under code

Code becomes less precious with timeAnd somebody might contribute features

G Varoquaux 20

Page 57: Succeeding in academia despite doing good_software

Community development in scikit-learnHuge feature set:

benefits of a large teamProject growth:

More than 200 contributors„ 12 core contributors

1 full-time INRIA programmerfrom the start

Estimated cost of development: $ 6 millionsCOCOMO model,http://www.ohloh.net/p/scikit-learn

G Varoquaux 21

Page 58: Succeeding in academia despite doing good_software

Communities: many eyes makes code fast

L. Buitinck, O. Grisel, A. Joly, G. Louppe, J. Nothman, P. Prettenhofer

G Varoquaux 22

Page 59: Succeeding in academia despite doing good_software

Having an impact

You need a community

G Varoquaux 23

Page 60: Succeeding in academia despite doing good_software

What’s in a scientific-computing environment

G Varoquaux 24

Page 61: Succeeding in academia despite doing good_software

The scientific workflow agile

Interaction...Ñ script...Ñ module...

ý interaction again...

Consolidation,progressively

Low tech and shortturn-around times

G Varoquaux 25

Page 62: Succeeding in academia despite doing good_software

Choose your weapons

Python, what else?Interactive languageEasy to read / writeGeneral purpose

G Varoquaux 26

Page 63: Succeeding in academia despite doing good_software

Choose your weapons

Python, what else?Interactive languageEasy to read / writeGeneral purposeOld virtual machine /compilerYounger languagespromissing (Julia)

but will they getadoption beyond science?

G Varoquaux 26

Page 64: Succeeding in academia despite doing good_software

Choose your weapons

Python, what else?+Numpy arraysShoe-horn your data in anumpy array, and you’ve won

personnally disappointedthat pandas drifted away

G Varoquaux 26

Page 65: Succeeding in academia despite doing good_software

Software architecture for science“Scriptability” is paramountIn an application: MVC (model, view, controller)

ModelNumerical ordata-processingcore

ViewOuput: graphs,or filesMust enableheadless use

ControllerInput: dialogs,or an APIAvoid input as files:not expressive

Dialogs should never be far from the codeDialog generation: traits, IPython widgetsReactive programming:

dialogs modify object, and the model updatesDon’t own the main

In Mayavi: script generation for free

G Varoquaux 27

Page 66: Succeeding in academia despite doing good_software

Software architecture for science“Scriptability” is paramountIn an application: MVC (model, view, controller)

ModelNumerical ordata-processingcore

ViewOuput: graphs,or filesMust enableheadless use

ControllerInput: dialogs,or an APIAvoid input as files:not expressive

Dialogs should never be far from the codeDialog generation: traits, IPython widgetsReactive programming:

dialogs modify object, and the model updatesDon’t own the main

In Mayavi: script generation for free

G Varoquaux 27

Page 67: Succeeding in academia despite doing good_software

Quality is free‹

‹ This is a book, by Philip CrosbyG Varoquaux 28

Page 68: Succeeding in academia despite doing good_software

You need qualityQuality will give you users

Bugs give you bad rap

Quality will give you developersContribute to learn and improve

Quality will make your developers happyPeople need to be proud of their work

Do less, do betterGoes against the grant-system incentive

G Varoquaux 29

Page 69: Succeeding in academia despite doing good_software

Quality: what & howGreat documentation

Simplify, but don’t dumb downFocus on what the user is trying to solve

Great APIsExample-based developmentIf something is hard to explain, rethink the conceptsLimit the number of different concepts and objectsConsistency, consistency, consistency

Good numericsWrite tests based on mathematical propertiesWhen a user finds an instability, write a new test

Quality enables reuseBeyond mere reproducibility

G Varoquaux 30

Page 70: Succeeding in academia despite doing good_software

Quality: what & howGreat documentation

Simplify, but don’t dumb downFocus on what the user is trying to solve

Great APIsExample-based developmentIf something is hard to explain, rethink the conceptsLimit the number of different concepts and objectsConsistency, consistency, consistency

Good numericsWrite tests based on mathematical propertiesWhen a user finds an instability, write a new test

Quality enables reuseBeyond mere reproducibility

G Varoquaux 30

Page 71: Succeeding in academia despite doing good_software

Be productive

G Varoquaux 31

Page 72: Succeeding in academia despite doing good_software

Be productive

“If you spend too much time thinking about athing, you’ll never get it done.” — Bruce Lee

G Varoquaux 31

Page 73: Succeeding in academia despite doing good_software

Limited resourcesLimited resources are good

Need success in the short term, not the long term

The startup culture: fail fastQuickly identify non-viable projects

The simpest solution that works is the best

G Varoquaux 32

Page 74: Succeeding in academia despite doing good_software

Short cycles, limited ambitions

Keep coming back to your usersRelease early, release often

G Varoquaux 33

Page 75: Succeeding in academia despite doing good_software

SimplicityComplexity increase superlinearly

[An Experiment on Unit Increase in Problem Complexity,Woodfield 1979]

25% increase in problem complexityñ 100% increase in code complexity

The 80/20 rule80% of the usecases can be solvedwith 20% of the lines of code

Avoid feature creep

Use objects sparinglyDon’t use classes for the sake of it

G Varoquaux 34

Page 76: Succeeding in academia despite doing good_software

SimplicityComplexity increase superlinearly

[An Experiment on Unit Increase in Problem Complexity,Woodfield 1979]

25% increase in problem complexityñ 100% increase in code complexity

The 80/20 rule80% of the usecases can be solvedwith 20% of the lines of code

Avoid feature creep

Use objects sparinglyDon’t use classes for the sake of it

G Varoquaux 34

Page 77: Succeeding in academia despite doing good_software

SimplicityComplexity increase superlinearly

[An Experiment on Unit Increase in Problem Complexity,Woodfield 1979]

25% increase in problem complexityñ 100% increase in code complexity

The 80/20 rule80% of the usecases can be solvedwith 20% of the lines of code

Avoid feature creep

Use objects sparinglyDon’t use classes for the sake of it

G Varoquaux 34

Page 78: Succeeding in academia despite doing good_software

Software engineering

G Varoquaux 35

Page 79: Succeeding in academia despite doing good_software

Software engineering good practicesVersion control

Use git + githubUnit testingIf it’s not tested, it’s broken or soon will be.

Make a package,with controlled dependencies and compilation

...

G Varoquaux 36

Page 80: Succeeding in academia despite doing good_software

Research ‰ productionNeed to adapt software-engineering principles

ÓGood naming is freeUse functions, not scriptsVersion control is very cheapTests are more expensive... Considering if goalsstabilizesBuild chains are hard

Go down the chain as your research progressYou can think of shipping a software only if it wasviable to go completely down the chain

G Varoquaux 37

Page 81: Succeeding in academia despite doing good_software

Things we did right (maybe)

G Varoquaux 38

Page 82: Succeeding in academia despite doing good_software

Mayavi: 3D visualization in PythonSuccess factors

Building upon VTK Great powerComponent model (UI)Internals open to the world

ñ from interaction to scripting

Limiting factorsBuilding upon VTK A lot of complexityCodebase too complex and object-oriented

(bound to VTK)Users of GUIs do not turn into developersComposition is an API killer

G Varoquaux 39

Page 83: Succeeding in academia despite doing good_software

Mayavi: 3D visualization in PythonSuccess factors

Building upon VTK Great powerComponent model (UI)Internals open to the world

ñ from interaction to scripting

Limiting factorsBuilding upon VTK A lot of complexityCodebase too complex and object-oriented

(bound to VTK)Users of GUIs do not turn into developersComposition is an API killer

G Varoquaux 39

Page 84: Succeeding in academia despite doing good_software

joblib: computational workflow patterns

Parallel for loop>>> from joblib import Parallel, delayed>>> Parallel(n jobs=2)(delayed(sqrt)(i**2)... for i in range(8))

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]On-demand dispatch to ease memory consumptionThreading and processes backends

G Varoquaux 40

Page 85: Succeeding in academia despite doing good_software

joblib: computational workflow patterns

Parallel for loop>>> from joblib import Parallel, delayed>>> Parallel(n jobs=2)(delayed(sqrt)(i**2)... for i in range(8))

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]

Memoize patternmem = joblib.Memory(cachedir=’.’)g = mem.cache(f)b = g(a) # computes a using fc = g(a) # retrieves results from store

G Varoquaux 40

Page 86: Succeeding in academia despite doing good_software

joblib: computational workflow patterns

Success factorsSimplicity of usePatterns we really, really need (pull not push)

G Varoquaux 41

Page 87: Succeeding in academia despite doing good_software

joblib: computational workflow patterns

Success factorsSimplicity of usePatterns we really, really need (pull not push)

Limiting factorVision of the project unclearPositioning with regards to landscape unclear

(IPython, where are you headed?)Tricky code inside

G Varoquaux 41

Page 88: Succeeding in academia despite doing good_software

scikit-learn: machine learning in Python

Success factorsRight project vision

Machine learning without learning the machineryBlack box that can be openedRight trade-off between ”just works” and versatility

(think Apple vs Linux)We’re not going to solve all the problems for you

I don’t solve hard problemsFeature-engineering, domain-specific cases...

Python is a programming language. Use it.

Cover all the 80% usecases in one package

G Varoquaux 42

Page 89: Succeeding in academia despite doing good_software

scikit-learn: machine learning in Python

Success factorsRight project visionHigh-level programming

- Optimize algorithmes, not for loops- Know perfectly Numpy and scipy

All significant data should be in arraysAvoid memory copies, rely on blas/lapack

- Use Cython, quad not C/C++

G Varoquaux 42

Page 90: Succeeding in academia despite doing good_software

scikit-learn: machine learning in Python

Success factorsRight project visionHigh-level programmingGood API design

- separate data from operations

0387

8794

7979

27

0179

0752

7015

78

9407

1746

1247

97

5497

0718

7178

87

1365

3490

4951

90

7475

4265

3580

98

4872

1546

3490

84

9034

5673

2456

14

7895

7187

7456

200387

8794

7979

27

0179

0752

7015

78

9407

1746

1247

97

5497

0718

7178

87

1365

3490

4951

90

7475

4265

3580

98

4872

1546

3490

84

9034

5673

2456

14

7895

7187

7456

20

0387

8794

7979

27

0179

0752

7015

78

9407

1746

1247

97

5497

0718

7178

87

1365

3490

4951

90

7475

4265

3580

98

4872

1546

3490

84

9034

5673

2456

14

7895

7187

7456

200387

8794

7979

27

0179

0752

7015

78

9407

1746

1247

97

5497

0718

7178

87

1365

3490

4951

90

7475

4265

3580

98

4872

1546

3490

84

9034

5673

2456

14

7895

7187

7456

20

0387

8794

7979

27

0179

0752

7015

78

9407

1746

1247

97

5497

0718

7178

87

1365

3490

4951

90

7475

4265

3580

98

4872

1546

3490

84

9034

5673

2456

14

7895

7187

7456

200387

8794

7979

27

0179

0752

7015

78

9407

1746

1247

97

5497

0718

7178

87

1365

3490

4951

90

7475

4265

3580

98

4872

1546

3490

84

9034

5673

2456

14

7895

7187

7456

20

G Varoquaux 42

Page 91: Succeeding in academia despite doing good_software

scikit-learn: machine learning in Python

Success factorsRight project visionHigh-level programmingGood API design

- separate data from operations- Object API exposes a data-processing language

fit, predict, transform, score, partial fit

- Instantiated without data but with all parameters

G Varoquaux 42

Page 92: Succeeding in academia despite doing good_software

scikit-learn: machine learning in Python

Success factorsRight project visionHigh-level programmingGood API designGreat community

- Github + code review

G Varoquaux 42

Page 93: Succeeding in academia despite doing good_software

scikit-learn: machine learning in Python

Success factorsRight project visionHigh-level programmingGood API designGreat communityGreat documentation

G Varoquaux 42

Page 94: Succeeding in academia despite doing good_software

scikit-learn: machine learning in Python

Success factorsRight project visionHigh-level programmingGood API designGreat communityGreat documentation

Limiting factorsTricky numerical codeOur own success ñ huge volume

G Varoquaux 42

Page 95: Succeeding in academia despite doing good_software

@GaelVaroquaux

Succeeding in academia despite doing good software1 Game the system

It’s about convincing a tenure committeeCode must contribute to a scientific problem

2 Not all battles can be fought

3 Make good software

Page 96: Succeeding in academia despite doing good_software

@GaelVaroquaux

Succeeding in academia despite doing good software1 Game the system

2 Not all battles can be foughtMake sure that there is a market

Don’t solve hard problemsProblems that matter for science and industry

3 Make good software

Page 97: Succeeding in academia despite doing good_software

@GaelVaroquaux

Succeeding in academia despite doing good software1 Game the system

2 Not all battles can be fought

3 Make good softwareThat actually answers scientists needs

With quality, software engineeringRelying on a communauty

Usability matters

Page 98: Succeeding in academia despite doing good_software

@GaelVaroquaux

Succeeding in academia despite doing good software1 Game the system

2 Not all battles can be fought

3 Make good software

Now go out, and code!