Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your...

26
Confidential / 1 3 August 2017 3 August 2017 Data Science Docker Dockerize Your Data Science Environment

Transcript of Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your...

Page 1: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 13 August 2017

3 August 2017

Data Science ❤DockerDockerize Your Data Science Environment

Page 2: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 23 August 2017

Outlook

• About Me, Detego and RFID

• Docker Basics

• Docker for Data Science Environments

• Connecting your Data Science Environment to other Services

Page 3: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 33 August 2017

About Me, Detego and RFID

Florian Geigl

PhD from Institute of Interactive Systems and Data Science, Graz University of Technology

Short-Term Scholar at Information Science Institute, University of Southern California

Working as Data Scientist at Detego

Attended Kaggle Competitions

Latin: reveal, uncover, display

located in Graz

~35 employees

Fashion-Retail Industry

International Customers

Page 4: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 43 August 2017

About Me, Detego and RFID

Page 5: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 53 August 2017

www.detego.com

Page 6: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 63 August 2017

3 August 2017

Docker Basics

Page 7: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 73 August 2017

https://www.docker.com/what-docker

…”Developers use Docker to eliminate “works on my machine” problems when collaborating on code with co-workers.”…

“everything required to make a piece of software run is packaged into isolated containers. Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed. This makes for efficient, lightweight, self-contained systems and guarantees that software will always run the same, regardless of where it’s deployed.”

Page 8: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 83 August 2017

• Consistent Environments • Linux, MacOS, Windows

• AWS, Azure & many more

• Native Performance• + CUDA version

• Resources Saving

• Easy Configuration• Pre-build/official Images

• or custom Docker Image

• Easy Mounting of Data

Why Docker?

Page 9: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 93 August 2017

How fast can you set up an apache server?Switch between apache versions?

Set up an identical apache server on Linux, Mac & Windows?

Page 10: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 103 August 2017

Live Demo: Apache

docker run

-it

--rm

-p 8888:80

-v C:\path\to\data:/usr/local/apache2/htdocs/

httpd

Image != Container

Image == Class

Container == Instance

Page 11: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 113 August 2017

Page 12: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 123 August 2017

Page 13: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 133 August 2017

Docker…

• has basically no overhead

• provides native performance

• provides a consistent environment

• allows you to build your own docker image

• runs on any host OS

• allows to easily mount data into a container

• starts instantly

• …

Page 14: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 143 August 2017

3 August 2017

Building a Docker Data Science Environment

Page 15: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 153 August 2017

Building your own Docker Images

e.g.: Ubuntu & vim

“Dockerfile”

FROM ubuntu:latest

RUN apt-get updates && apt-get install vim

docker build .

-> results in a docker image

Page 16: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 163 August 2017

Docker Data Science Image

Based on Kaggle’s Docker Image: https://hub.docker.com/u/kaggle/

Open-Source: https://github.com/floriangeigl/docker-DataScience

- pull requests are highly welcome

Contained Services:

- Python (2&3)

- R

- Julia

- Jupyter Notebooks

- Jupyter Labs

- RStudio

“docker pull floriangeigl/datascience”

(-> pulls or updates an image)

Page 17: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 173 August 2017

Do It – Do It Now!

docker run --rm -it -p 8888:8888 -p 8889:8889 -p 8787:8787 -p 2222:22 –p 9001:9001 -v "${pwd}:/data/" --name dsdocker floriangeigl/datascience /bin/bash

docker run: Create a container from an image and executes a given command

--rm: Remove the container after shutdown

-p: Map a port from the container to our host machine

(e.g.: HostPort:ContainerPort)

-v: Mount a directory into the container

(e.g.: HostPath:ContainerPath)

pwd = print working directory = current path

floriangeigl/datascience: Docker image

/bin/bash: Executed command

Image != Container

Image == Class

Container == Instance

Page 18: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 183 August 2017

Live Demo: Data Science Container

8888: jupyter notebooks

8889: jupyter labs

8787: r-studio-server

22: ssh

9001: supervisord (status of services; restart services; logs…)

Page 19: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 193 August 2017

Best Practice #1: Aliases

Win Powershell:

run “notepad $PROFILE”

add “function dsdocker {docker run --rm -i -t -p … -v "${pwd}:/data/“ …}

restart Powershell & use your new “dsdocker” command

Linux&Mac:

add an alias for the command

${pwd} -> $(pwd)

-> see: https://github.com/floriangeigl/docker-DataScience

Page 20: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 203 August 2017

Best Practice #2 – Fixed Project Structure

• Cookiecutter: https://github.com/drivendata/cookiecutter-data-science

Page 21: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 213 August 2017

Known Bugs

Issues with de-keyboard:

Can’t type “\” on german keyboard in Chrome & IE

https://github.com/jupyter/notebook/issues/2379#issuecomment-301268937

-> workaround: use Firefox

Page 22: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 223 August 2017

3 August 2017

Connecting to other Services

Page 23: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 233 August 2017

Databases anyone?

Get your hands dirty on various technologies

https://hub.docker.com/u/library/

Page 24: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 243 August 2017

Docker-Compose

version: "3.1"

services:

datascience:

image: floriangeigl/datascience:latest

ports:

- "8888:8888"

- "8889:8889"

- "8787:8787"

- "9001:9001"

volumes:

- ./:/data/

links:

- mongo

- cassandra

mongo:

image: mongo:latest

# persistent storage

volumes:

- ./data/mongo/:/data/db

cassandra:

image: cassandra:latest

Accessible ports:

8888, 8889, 8787,

….

Hostname: mongo Hostname: cassandra

Page 25: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 253 August 2017

Do It – Do It Now!

Go to /path/to/compose-file

Run “docker-compose(.exe) up”

Use the stack

Shutdown: Strg+C

Remove used containers: “docker-compose(.exe) rm”

Page 26: Data Science Docker• Docker Basics • Docker for Data Science Environments • Connecting your Data Science Environment to other Services 3 August 2017 Confidential / 3 About Me,

Confidential / 263 August 2017

Questions?