Spring 2020 Advanced Python [email protected] Michael ... · Python 3+ is actively developed minor...

62
MN Supercomputing Institute for Advanced Computational Research © 2009 Regents of the University of Minnesota. All rights reserved. Advanced Python for Scientific Computing Michael Milligan [email protected] Follow along @ https://z.umn.edu/msipython Spring 2020 1

Transcript of Spring 2020 Advanced Python [email protected] Michael ... · Python 3+ is actively developed minor...

Page 1: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Advanced Pythonfor

Scientific ComputingMichael Milligan

[email protected] along @ https://z.umn.edu/msipython

Spring 2020

1

Page 2: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

To get the most out of this...• Basic knowledge of: Linux, MSI, Python

• Access to campus network

o Use Campus VPN if you are watching from off-campus

• Working MSI login

• Follow along! https://z.umn.edu/msipython

• Many examples will work at home!

o Canopy or Anaconda provide easy to install Python with scientific and math libraries included

o BinderHub link at the above URL for a scratch environment available for anyone to use online

2

Page 3: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Tutorial Outline

MORNING SESSION: 10 am - noon• Connecting with Python @ MSI

• Getting Started with Python

• Math with Python

AFTERNOON SESSION: 1 pm - 3 pm

• Visualization and plotting

• Data processing

• Scaling up to supercomputing

3

Page 4: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

When you leave today, you should be able to...

• Program interactively with Python Notebooks

• Understand the basics of numpy and scipy

• Efficiently compute with large arrays of data

• Load and save data using files and web resources

• Use matplotlib to visualize data

• Take advantage of supercomputing resources with batch jobs and parallel computing

• Know where to turn for more help with these topics!

4

Page 5: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Connecting with Python @ MSI

Infrastructure, tools, and background

5

Page 6: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

MSI Systems Overview

6

Page 7: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Why Interactive Computing?

• Immediate feedback

• Prototyping workflows

o Design your workflow for a small set of cores

o Discover and test new tools/concepts

o Profile, optimize and debug

• Data Visualization and Exploration

• Python has great tools for interactive computing

Check out our Interactive Computing tutorial for more!

7

Page 8: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Jupyter Notebooks Server

• https://notebooks.msi.umn.edu

• Access web-based environment providing Jupyter Notebook document-oriented computing

• Currently supporting Python 2, Python 3, R

• In-browser terminal and file browsing

• MSI Beta service - expect changes and updates

8

Page 9: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Notebooks = Jobs

Main compute on Mesabi

● default “interactive queue”● larger memory option● longer runtime option

9

Page 10: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Notebooks = Jobs

Main compute on Mesabi

● default “interactive queue”● larger memory option● longer runtime option

10

Page 11: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Job Queues on Mesabi

https://www.msi.umn.edu/queues

11

Page 12: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Command Line Python

Run Python from an MSI command line prompt.

This is the best way to run pre-existing Python scripts.

Easy to transform command line work into batch jobs.

module load <python module>

python my-cool-program.py

12

Page 13: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Command Line IPython

IPython - the Interactive Python environment

● IPython is also the basis of the Jupyter notebook● If you are already at an MSI command-line terminal,

IPython is convenient and powerful

module load <python module>

ipython

13

Page 14: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Python modules% module avail python

-------------- /panfs/roc/soft/modulefiles.hpc --------------python/3.7.1_anaconda python3/3.7.1_anaconda python2/2.7.15_anaconda

------------ /panfs/roc/soft/modulefiles.common -------------python/3.2.3 python/3.4 python/3.6.3(default) python2/2.7.12_anaconda4.1.1 python2/2.7.12_anaconda4.2(default) python2/2.7.16_anaconda2019.10 python3/3.4 python3/3.5.2_anaconda4.1.1 python3/3.6.3_anaconda5.0.1(default) python3/3.7.4_anaconda2019.10

Most of the time this is all you need:

% module load python

14

Page 15: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

What kind of Python?Python comes in multiple versions and distributions.

● Python 3+ is actively developed○ minor incompatibilities with Python 2.x code

● The scientific world has mostly transitioned from Python 2.7 to Python 3+

● You should use Python 3 for new projects● Consider updating smaller scripts to use Python 3● Fine to use Python 2.7 to access software not updated

○ BUT Python 2.7 EOL is coming in 2020● Prefer our installed distributions, most packages you

need will already be there

15

Page 16: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Where to run Python?mesabi HPC other options

16

Jupyter Notebooksvia nb.msi.umn.edu

SSH or NX Terminallog in and then:login> ssh mesabi

NICE Desktop Terminalssh mesabi

Batch Jobs via qsub

Mangi HPClog in and then:login> ssh mangi

NICE Desktop Terminaljust run python

and various other places

Page 17: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

To get the most out of this...revisited

• Basic knowledge of: Linux, MSI, Python

• Access to campus network

o Use Campus VPN if you are watching from off-campus

• Working MSI login

• Follow along! https://z.umn.edu/msipython

• Many examples will work at home!

o Canopy or Anaconda provide easy to install Python with scientific and math libraries included

17

Page 18: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

No seriously. Follow along! z.umn.edu/msipython

Visit nb.msi.umn.edu to start your notebook.

Clone the Github repository of examples.

Or run repository through BinderHub directly from GitHub

18

Page 19: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Python Basics

Starting from “Advanced Python.ipynb”

19

Page 20: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Modules and PackagesSoftware libraries are organised into modules:

import tool

Tries to load:

./tool.pytool.py from sys.pathtool/__init__.py from sys.path

from tool import widget

equals:

import toolwidget = tool.widget

20

Page 21: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Python Standard LibraryPython aims to come “batteries included”

Very wide range of functionality included in the Python Standard Library

Programs using only Standard Library should run in any Python install (some caveats for older/Windows OS)

Current list is here:https://docs.python.org/3/library/

21

Page 22: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Getting More LibrariesPIP Local Install

Several ways to install new modules, but 90% of the time this will work:

pip install --user <package name>

or even

pip install --user git+https://github.com/python-tool

● Use virtual environments for isolation if needed● If in doubt check the package’s README● Email us for more assistance

22

Page 23: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Getting More Libraries

CONDA Environments

Anaconda environments are based on the conda package manager and an ecosystem of conda repositories

conda env list - lists available environments

To create and use a personal environment with packages of your chosing:

conda create -n <name> python pkg1 pkg2

source activate <name>

conda install pkg3 pkg4

23

Page 24: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

PIP vs CONDA for installing modules

PIP CONDA

24

● Works great for simple modules● Integrates with existing Python

modules and managed install● Tied to Python major version

(e.g. 2.7, 3.4, 3.6)● Can install packages directly

from GitHub or local downloads● Almost all Python software

available through PyPI

● Works great for simple or complex modules

● Creates a standalone self-contained “environment”

● Installed modules tied to specific environment that must be separately activated

● Better for software with complex or “picky” dependencies

● Most widely used software available through conda channels

Page 25: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Can I use new modules with Jupyter?

YESPIP installed modules: should just work

CONDA installed modules: need to add new “kernelspec”1. Activate the environment normally from terminal (use a command like "source activate ...")2. Run the command: jupyter kernelspec list3. Note the path printed for the environment you want to use4. Substitute that path in place of MYPATH below

Substitute “myenv” with any unique name in the command below5. Run the command:

jupyter kernelspec install --user --name=myenv MYPATH

Optional: find that kernel.json file and change the “display name” to something descriptive.

25

Page 26: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

High Performance MathNumpy and Scipy

26

Page 27: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Numpy and ScipyNumPy provides:

● Basic array and matrix data types

● Efficient implementations of low-level math operations

● A large library of high-level math functions built from efficient primitives

SciPy provides:

● A home for a wide variety of open-source mathematical and scientific algorithms

● Modules for optimization, signal processing, linear algebra, statistics, interpolation, and more

27

Page 28: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Numpy and Scipy

Let’s explore these further in Jupyter…

Start from notebook/AdvPyFiles/01.NumpyArrays

28

Page 29: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

more useful numpy modules...

numpy.fft – FFTs, forward/inverse, 1-D and N-D

numpy.random – generate random numbers, many distributions to choose from

numpy.matrix – special arrays that obey matrix math

numpy.polynomial – module for representing and manipulating arbitrary polynomials

29

Page 30: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

SciPy expands the menu:•Clustering algorithms (scipy.cluster)

•Integration and ODEs (scipy.integrate)

•Interpolation (scipy.interpolate)

•Input and output (scipy.io)

•Linear algebra (scipy.linalg)

•Multi-dimensional image processing (scipy.ndimage)

•Optimization and root finding (scipy.optimize)

•Signal processing (scipy.signal)

•Sparse matrices (scipy.sparse)

•Spatial algorithms and data structures (scipy.spatial)

•Special functions (scipy.special)

•Statistical functions (scipy.stats)

•And then some…

30

Page 31: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

SciPy is also fast!

● Most SciPy routines use NumPy for fast low-level math operations

● Some SciPy routines use highly optimized external libraries

● E.g. scipy.linalg links to BLAS, LAPACK or MKL behind the scenes

● For even faster Python, ask us about the experimental optimized Intel Python Distribution

31

Page 32: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Data VisualizationMatplotlib friends and alternatives

32

Page 33: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Plotting made easy

● Matplotlib provides publication-quality 2-D plotting(with some 3D capabilities too)

○ Display in window or output to PDF, SVG, PNG, etc○ Implemented as modular object-oriented system○ IPython: enable with %matplotlib

● Pylab provides a Matlab-ish interactive plotting interface to Matplotlib

○ Usually accessed via import matplotlib.pylab as plt○ Defaults to popping up plots in a separate window○ In notebook: use %matplotlib inline

to enable plots in the browser window

33

Page 34: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

More plotting resourcesThese are built on matplotlib and will interoperate well

● Seaborn specializes in statistical plots, provides similar styles and capabilities to popular R plotting packages

● Pandas has built-in plotting with some convenience features for working from DataFrames

● Basemap, cartopy etc for GIS graphics needs

There are other plotting mechanisms out there too:

● Bokeh focuses on interactive web visualizations● Plotly integrates with plot.ly for interactive web● ggplot aims to replicate ggplot2 from R● MayaVi high performance 3D modelling/volumetric app

34

Page 35: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Data ProcessingLoading saving and processing datasets

35

Page 36: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Data on Disk

Chances are you want to load and save data!

● numpy and scipy.io offer a variety of generic options● pandas can read and write many tabular formats● several specialized options for large/complex data

36

Page 37: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Text files: portable

● Very common for smaller data sets:○ simple columns of numbers○ human readable in a pinch

● numpy.loadtxt() – simple interface, good defaults● numpy.genfromtxt() – more complex, handles

unusual formatting, comments, missing values, etc

37

Page 38: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Text files: portable

More common options

● CSV files

○ Pandas: fast, good defaults

○ csv module: standard library option

● JSON

○ Pandas: fast, good defaults (again!)

○ json module: standard library

38

Page 39: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Binary formats

● Binary data is much more scalable● Smaller files on disk● Faster to load and save● May be necessary to exchange data with other

software● Stick to portable (machine-independent) formats

39

Page 40: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Binary formats•NumPy native format (.npy)

•numpy.load() and numpy.save()

•Or use numpy.savez() to store many arrays in compressed .npz

•Fast, portable, but mostly only supported by Python

• scipy.io.matlab – support for Matlab (.mat)

• scipy.io.loadmat() and scipy.io.savemat()

• scipy.io.idl – read (no save) IDL .sav files

• scipy.io.readsav()

40

Page 41: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Binary formats(recommended)

● HDF5 - compact table-oriented datasets○ pytables, pandas, h5py

● NetCDF4 - similar to HDF5○ prefer HDF5 for new projects

● Parquet - gaining traction in data science○ fastparquet, pandas, several others

● Pickle - python’s native serialization format○ e.g. for complex data structures, data-with-code, etc

41

Page 42: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Scaling up to HPCAcceleration and parallelism

42

Page 43: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Further Reading

● Official Python Documentationhttps://docs.python.org/3/

● Scientific Python Docs Hubhttps://scipy.org/docs.html

● IPython Cookbookhttp://ipython-books.github.io/

● Jupyter Notebook Galleryhttps://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks

43

Page 44: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Supplemental Material

44

Page 45: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Additional Documentation

• Overview of IPython's architecture for parallel and distributed computing.

• Detailed discussion of IPython cluster controller and engines.

• Discussion of IPython magic commands

• Official IPython documentation

45

Page 46: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Batch vs. Interactive ComputingBatch (qsub) Interactive

SSH (qsub -I)Interactive Desktop (Linux)

Interactive Desktop

(Windows)Wall clock limit

696 hours 696 hours1,2 24 hours 48 hours

Requires SUs ✔ ✔3 ✕4 ✕

Memory limit 1 TB 1 TB 16 GB 61 GBCore Limit 8640 8640 4 8Software Modules

400+ 400+ 400+ 50+

GPUs ✔ ✔ ✔ ✔

GUIs ✕ ✔ ✔ ✔

46

1. Don’t be a jerk 2. Larger requests receive lower priority 3. Resource dependent 4. Subject to change

Page 47: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Batch Jobs• When should you use Batch Jobs?

o Whenever possible! This is the traditional way to work in HPC

o “Don’t Be a Jerk”; share resources and be considerate of other researchers

• What are the benefits of Batch Jobs?

o Headless execution of automated processes

o Long runtimes

o Large core counts

o A scheduler packs jobs in hardware to maximize utilization, reduce latency, etc.

47

Page 48: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Batch Example• The Job Script (hello.pbs): A BASH shell script

containing #PBS directives and commands

• Submit this to a queue on Mesabi, Itasca, or the Lab Queue*: qsub hello.pbs

48

Page 49: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Interactive Batch

49

Page 50: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Interactive SSH (qsub -I)• ssh to login.msi.umn.edu, then lab (or mesabi, or itasca)

• qsub -I -lwalltime=$W, nodes=$N:ppn=$P

50

Page 51: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Pre-requisite: VPN(for Non-UMN Networks)

https://it.umn.edu/virtual-private-network-vpn

51

Page 52: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Pre-requisite: SSH(for Windows Users)

http://www.putty.org/

52

Page 53: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Interactive Jobs with “qsub -I”

53

Page 54: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Selecting Resources

• SSH to any cluster headnode:

o ssh login.msi.umn.edussh mesabi

• Queue an Interactive (-I) job on any cluster:

o qsub -I [options]

54

Page 55: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Selecting Resources

Options to QSUB

o qsub -I [options]

• -l walltime=W

• -l nodes=X:ppn=Y

• -l pmem=M -OR- -l mem=M

• -q “QueueName”

• -A groupname

• -l gres=MATLAB+4

o Enable graphics via X-tunneling (-X)

55

Page 56: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Software Modules

• See all modules available

o module avail

• Load a module (adds commands to your shell)

o module load matlab

• Run the software from the module

o matlab

56

Page 57: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Simple SSH Keys

• Generate a new key

o ssh-keygen -t rsa -f ~/.ssh/id_rsa

• Authorize the public key for SSH

o Append ~/.ssh/id_rsa.pub to login.msi.umn.edu:~/.ssh/authorized_keys

• Add the key to your Agent keychain

o ssh-add ~/.ssh/id_rsa

• Login using the Agent for authentication

o ssh -AX login.msi.umn.edu

57

Page 58: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Launch IPython within NICE

• This demo benefits from setting up an SSH Key

• Open a Terminal (Menu -> System -> Terminal)

o module load python-epdipython notebook

• An IPython dashboard should open in your web browser

58

Page 59: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Load an Example

• Fetch some example notebooks

git clone https://github.com/mbmilligan/msi-ipython-nb-ex.git

• Browse to folder and open any *.ipynb

59

Page 60: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Launch Notebook on Mesabi

• Use Mesabi for larger resources

• Start interactive job on Mesabi (1 hour, 1 node, 8 CPUs):

o ssh loginssh mesabion mesabi> qsub -I –lwalltime=1:00:00 -lnodes=1:ppn=8 –lpmem=2gb

60

Page 61: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Launch Notebook on Mesabi• Extra step due to firewalls – use INternal (infiniband) network• Start IPython Notebook on INternal network address:

o cnXXX> module load python-epdcnXXX> ipython notebook --no-browser

--ip=in-$(hostname).mesabi

• ipython command will output lines likeo [I 10:02:58.191 NotebookApp] 0 active kernels

[I 10:02:58.191 NotebookApp] The IPython Notebook is running at: http://in-cn0658.mesabi:8888/[I 10:02:58.191 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

• Paste the URL into your web browser

61

Page 62: Spring 2020 Advanced Python milligan@umn.edu Michael ... · Python 3+ is actively developed minor incompatibilities with Python 2.x code The scientific world has mostly transitioned

MN Supercomputing Institutefor Advanced Computational Research

© 2009 Regents of the University of Minnesota. All rights reserved.

Tour of Notebook features

62

Start here