DOES14 - Glenn O'Donnell - Forrester - Modern Services Demand a DevOps Culture Beyond Apps
Data science apps: beyond notebooks
-
Upload
natalino-busa -
Category
Data & Analytics
-
view
175 -
download
0
Transcript of Data science apps: beyond notebooks
3Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY
Learning: The Scientific Method
Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method
observation hypothesis deduction synthesis
Hans Christian Ørsted
experiment
6
The Jupyter Projecthttp://jupyter.org
7
Jupyter notebook: what is it?
The Jupyter Notebook
8
Jupyter notebook: why?
Language of choice
The Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala.
Share notebooks
Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer.
Interactive widgets
Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime.
Big data integration
Leverage big data tools, such as Apache Spark, from Python, R and Scala. Explore that same data with pandas, scikit-learn, ggplot2, dplyr, etc.
9
Text Cell
Code Cell
Cell Input
Cell Output
Edit, Run, Kernel, Widgets Menu’s
Kernel Type
Cell output: ASCII, HTML, Image. etc
10
Architecture of a Jupyter Notebook
∅MQ
Notebook files
HTTPWebsockets
11
Architecture of a Jupyter Notebook
• Modular architecture:
Web App, Server, Kernel
• Kernels:
Python, R, Scala, Julia, Bash, SPARKQL
• Web App:
Asynchronous, rich editing, syntax highlight, export and share
12
Jupyter Notebook
● Narratives and Use Cases
Narratives are collaborative, shareable, publishable, and reproducible. We believe that Narratives help both yourself and other researchers by sharing your use of Jupyter projects, technical specifics of your deployment, and installation and configuration tips so that others can learn from your experiences.
From https://jupyter.readthedocs.io/en/latest/use-cases/content-user.html
16
Orioles: A powerful educational narrative
∅MQ
Notebook files
HTTPWebsockets
Video filesDocker Containers
17
18
Build your own narrative!
What do you need?
Understand how to communicate to the jupyter server
Two ways: websockets or http api endpoints
Build your own web application
Many ways: e.g. angular, polymer, dart, etc
1
2
19
Example: autoscience demo
Purpose:
- Quick exploration of data sets
- No coding required
- Visual analysis of outliers
24
Jupyter Gateway: expose API endpoints
Declare the endpoint
Produce the JSON payload
GET http://localhost:8800/cog/datasets/1
25
Jupyter Gateway: consume the data
Consume the JSON payload
GET http://localhost:8800/cog/datasets/1
app.controller('datasetCtrl', function ($scope, $routeParams, $http) { var id= $routeParams.id; $http({ method: 'GET', url: '/cog/datasets/'+id }).then(function successCallback(response) { // this callback will be called asynchronously // when the response is available $scope.d = response.data
}, function errorCallback(response) { // called asynchronously if an error occurs // or server returns response with an error status. });
});
26
<div class="row"> <div class="col-md-9 offset-md-2"> <p class="small">{{d.ds.rows}} obs. of {{d.ds.cols}} variables <br/> NA rows:{{d.ds.na.rows}}, columns:{{d.ds.na.cols}}</p> </div></div>
... <tr ng-repeat="v in d.vars"> <td><a href="#/ds/{{d.ds.id}}/variables/{{v.id}}">{{v.name}}</a></td> <td class="small">{{ v.sample.toString() }}</td> <td>{{v.type.vtype}}</td> <td>{{v.type.tcoerce}}</td> <td>{{v.type.unique}}</td> <td>{{v.type.nan}}</td> <td>{{v.type.valid}}</td> <td>{{v.type.quality}}</td>
...
Jupyter Gateway: consume the data $scope.d
Render the angular scope object
28
Jupyter: docker stacks
Docker container:jupyter notebook + apache toree
29
Dockerize your jupyter gateway api
Add the jupyter gateway
FROM jupyter/all-spark-notebook
...
# add some extra packagesADD packages /srv/RUN pip install -r /srv/packages
# install the kernel gatewayRUN pip install jupyter_kernel_gatewayENV JUPYTER_GATEWAY=1
# REST API is designed as notebooksADD notebooks /srv/notebooks
Add the notebook which powers the API
30
Dockerize your jupyter gateway api
IMAGE=autoscience/kernel_gateway
docker build -t $(IMAGE) .
docker run --rm -ti -p 8888:8888 $(IMAGE) \ jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 \ --KernelGatewayApp.port=8888 \ --KernelGatewayApp.api=notebook-http \ --KernelGatewayApp.seed_uri=/srv/notebooks/autoscience.ipynb
31
Dockerize your jupyter gateway api
∅MQ
Notebook files
HTTP REST API
Docker Containers
32
Summary
• Jupyter notebook is a great way to create and share
data-driven uses cases and projects
• Jupyter is more than notebooks
– gateway, kernels, hub, etc
• Narratives powered by jupyter
– O’ Reilly Orioles
– build your own: autoscience example
33
Resources