Computable content: Notebooks, containers, and data-centric organizational learning
-
Upload
domino-data-lab -
Category
Technology
-
view
23 -
download
0
Transcript of Computable content: Notebooks, containers, and data-centric organizational learning
Computable Content: Notebooks, containers, and data-centric organizational learning
Domino Data Science Popup 2017-02-22
Paco Nathan, @pacoid Dir, Learning Group @ O’Reilly Media
Project Jupyter
3
Project Jupyter is the evolution of iPython notebooks, applied to a range of different programming languages and environments
https://jupyter.org/
https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages
Some history…
4
Download Anaconda:
continuum.io/downloads
Activate the environment needed:
source activate py3k
Launch Juypter:
jupyter notebook
An example notebook (requires installs; see notes):
github.com/ceteri/oriole_jupyterday_atl/blob/master/example.ipynb
Installation and launch using Anaconda
5
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''
from textblob import TextBlob
blob = TextBlob(text)
print(blob.tags)
print(blob.noun_phrases)
Installation and launch using Anaconda
7
At its core, one can think of Jupyter as a suite of network protocols:
Jupyter is to the remote semantics of a REPL
as…
HTTP is to the remote semantics of file share
A suite of network protocols
9
JupyterHub
github.com/jupyterhub/jupyterhub
Jupyter in Education
groups.google.com/forum/#!forum/jupyter-education
JupyterLab (alpha preview)
github.com/jupyterlab/jupyterlab
Jupyter Kernels
github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages
Projects:
10
documentation
jupyter.readthedocs.io/en/latest/index.html
discussions
groups.google.com/forum/#!forum/jupyter
gitter.im/jupyter/jupyter
events
calendar.google.com/calendar/embed?src=p51j0ac1iccmj44tae12hq4dk0%40group.calendar.google.com
Resources:
15
Jupyter @ O’Reilly Media
Embracing Jupyter Notebooks at O'Reilly oreilly.com/ideas/jupyter-at-oreilly
Learn alongside innovators, thought-by-thought, in context oreilly.com/ideas/oreilly-oriole-learn-alongside-innovators-thought-by-thought-in-context
Oriole Online Tutorials safaribooksonline.com/oriole/
How Do You Learn? oreilly.com/learning/how-do-you-learn
16
For example…
• A unique new medium blends code, data, text, and video into a narrated learning experience with computable content
• Purely browser-based UX; zero installation required
• Substantially higher engagement metrics
• Opens the door for live coding in assessments
• GitHub lists over 300K public Jupyter notebooks
Regex Golf by Peter Norvigoreilly.com/learning/regex-golf-with-peter-norvig
17
Motivations
O’Reilly needed a way for authors to use Jupyter notebooks to create professional publications. We also wanted to integrate video narration into the UX. The result is a unique new medium called Oriole:
• Jupyter notebooks are used in the middleware
• each viewer gets a 100% HTML experience (no download/install needed)
• context as a “unit of thought”
• the code and video are sync’ed together
• each web session has a Docker container running in the cloud
18
Motivations
Innovators in programming, data science, dev ops, design, etc., tend to be really busy people. Tutorials are now much quicker to publish than “traditional” books and videos. The audience gets direct, hands-on, contextualized experience across a wide variety of programming environments.
19
Literate Programming, Don Knuth literateprogramming.com/
Paraphrased: Instead of telling computers what to do, tell other people what you want the computers to do
Some history
20
Wolfram Research introduced notebooks in 1988 for working with Mathematica…
Some history
21
PyCon 2016 Keynote, Lorena Barba youtu.be/ckW1xuGVpug?t=35m11s (video) figshare.com/articles/PyCon2016_Keynote/3407779 (slides)
Highly recommended: speech acts (based on Winograd and Flores) as theory for what we’re doing here
More recently
Notebook Practice
23
• focus on a concise “unit of thought”
• invest the time and editorial effort to create a good intro
• keep your narrative simple and reasonably linear
• “chunk” the text and code into understandable parts
• alternate between text, code, output, further links, etc.
• use markdown for interesting links: background, deep-dive, etc.
• code cells shouldn’t be long (< 10 lines), must show output
• load data+libraries from the container, not the network
• clear all output then “Run All” – or it didn’t happen
• video narratives: there’s text, and there’s subtext...
• pause after each “beat” – smile, breathe, let people follow you
Tips learned by teaching with Jupyter
For the JVM people: stop thinking only about IDEs, Ivy, Maven, etc. (ibid, Knuth1984)BUILD UBER JARS, LOAD LIBS FROM CONTAINER, NOT THE NETWORK!(apologies for shouting)
24
Jupyter notebooks + Git repos provide a low-cost, pragmatic way toward the practice of repeatable science – in this case, repeatable Data Science
• executable documents • code + params + results + descriptions • shareable insights
Notebooks: a cure for silos
25
In data science, we see the benefits to teams for shared insights, storytelling, etc.
Meanwhile domain expertise is generally more important than knowledge about tools
There’s a value for developers to use notebooks in lieu of IDEs in some cases – what are those cases?
GitHub now renders notebooks, so they can be used for documentation, reporting, etc.
Digital Object Identifiers (DOI) can be assigned through Zenodo, making notebooks citable for academic publication
“Sharing is caring”
Authoring & Scale-Out
28
Launchbot allows a notebook author to build a container that includes the required Jupyter kernel, installed libraries, datasets, etc.
You need to have Docker installed on your laptop
The backend uses Git and DockerHub to manage containers
For scale, deploy to DC/OS
Achieving scale
29
A notebook, a container, and ~20 minutes of informal video walk into a bar...
O’Reilly Media conferences + training:
NLP in Pythonrepeated live online courses
Strata SJ Mar 13-16 Deep Learning sessions, 2-day training
Artificial Intelligence NY Jun 26-29, SF Sep 17-20 SF CFP is open, follow @OreillyAI for updates
speaker:
periodic newsletter for updates, events, conf summaries, etc.:
liber118.com/pxn/@pacoid
A modest proposalJust Enough Math Building Data Science Teams
Hylbert-SpeysHow Do You Learn?