Scientific Python

31
Scientific Python Eueung Mulyana http://eueung.github.io/python/sci Hint: Navigate with Arrow Keys | Attribution-ShareAlike CC BY-SA 1 / 31

Transcript of Scientific Python

Page 1: Scientific Python

Scientific PythonEueung Mulyana

http://eueung.github.io/python/sciHint: Navigate with Arrow Keys | Attribution-ShareAlike CC BY-SA

1 / 31

Page 2: Scientific Python

Agenda1. Jupyter / IPython2. NumPy3. SciPy4. matplotlib5. pandas6. SymPy7. scikit-learn8. jakevdp: The State of the Stack

2 / 31

Page 3: Scientific Python

Jupyter / IPython

3 / 31

Page 4: Scientific Python

IPythonPowerful interactive shellSupports tab completion of just about everythingInline help system for modules, classes etc. with ?, sourcecode with ??Browser based notebook (Jupyter) with support for(runnable) code, text, mathematical expressions usingLATEX, inline plots etc.Could be used as a computational lab notes/worksheetsMagic functions to access the shell, run R code etc.Parallel computing

4 / 31

Page 5: Scientific Python

 

Notes on Jupyter

1. The Jupyter Notebook works with over 40 languages2. Jupyter Notebooks render on GitHub

JupyterComputational Narratives

1. Computers are optimized for producing, consuming andprocessing data.

2. Humans are optimized for producing, consuming andprocessing narratives/stories.

3. For code and data to be useful to humans, we need toolsfor creating and sharing narratives that involve code anddata.

The Jupyter Notebook is a tool for creating and sharingcomputational narratives.

5 / 31

Page 6: Scientific Python

Jupyter & Data ScienceThe Jupyter Notebook is a tool that allows us to explore thefundamental questions of Data Science

with a particular datasetwith code and datain a manner that produces a computational narrativethat can be shared, reproduced, modified, and extended.

At the end of it all, those computational narratives encapsulatethe goal or end point of Data Science. The character of thenarrative (prediction, inference, data generation, insight, etc.)will vary from case to case.

The purpose of computing is insight, not numbers.

Hamming, Richard (1962). Numerical Methods for Scientists and

6 / 31

Page 7: Scientific Python

NumPy

7 / 31

Page 8: Scientific Python

NumPyNumPy is the fundamental package for scientific computing withPython. It contains among other things:

A powerful N-dimensional array objectSophisticated (broadcasting) functionsTools for integrating C/C++ and Fortran codeUseful linear algebra, Fourier transform, and randomnumber capabilities

Besides its obvious scientific uses, NumPy can also be used asan efficient multi-dimensional container of generic data.

Arbitrary data-types can be defined. This allows NumPy toseamlessly and speedily integrate with a wide variety ofdatabases.

NumPy provides a powerful N-dimensions array object

Methods on these arrays are fast because they relies onwell-optimised librairies for linear algebra (BLAS, ATLAS,MKL)NumPy is tolerant to python’s lists

NumPy inherits from years of computer based numericalanalysis problem solving

8 / 31

Page 9: Scientific Python

import numpy as np

a = np.array([1, 2, 3]) # Create a rank 1 arrayprint type(a) # Prints "<type 'numpy.ndarray'>"print a.shape # Prints "(3,)"print a[0], a[1], a[2] # Prints "1 2 3"a[0] = 5 # Change an element of the arrayprint a # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 arrayprint b.shape # Prints "(2, 3)"print b[0, 0], b[0, 1], b[1, 0] # Prints "1 2 4"

# -----a = np.zeros((2,2)) # Create an array of all zerosprint a # Prints "[[ 0. 0.] # [ 0. 0.]]"

b = np.ones((1,2)) # Create an array of all onesprint b # Prints "[[ 1. 1.]]"

c = np.full((2,2), 7) # Create a constant arrayprint c # Prints "[[ 7. 7.] # [ 7. 7.]]"

d = np.eye(2) # Create a 2x2 identity matrixprint d # Prints "[[ 1. 0.] # [ 0. 1.]]"

e = np.random.random((2,2)) # Create an array filled with random valuesprint e # Might print "[[ 0.91940167 0.08143941] # [ 0.68744134 0.87236687]]"

NumpyNumpy is the core library for scientific computing in Python. Itprovides a high-performance multidimensional array object(MATLAB style), and tools for working with these arrays.

ArraysA numpy array is a grid of values, all of the same type, andis indexed by a tuple of nonnegative integers.The number of dimensions is the rank of the array; theshape of an array is a tuple of integers giving the size ofthe array along each dimension.We can initialize numpy arrays from nested Python lists,and access elements using square brackets.Numpy also provides many functions to create arrays.

9 / 31

Page 10: Scientific Python

SciPy

10 / 31

Page 11: Scientific Python

SciPySciPy is a Python-based ecosystem of open-source software formathematics, science, and engineering. SciPy core packages:IPython, NumPy, SciPy Library, SimPy, matplotlib, pandas.

SciPy LibrarySciPy is a collection of mathematical algorithms and conveniencefunctions built on top of NumPy includes modules for: statistics,integration & ODE solvers, linear algebra, optimization, FFT, etc.

We use the terms SciPy and SciPy Library interchangeably.Meaning depends on context.

SciPy is a toolbox for researchers/scientists, it contains manyhidden treasures for them.

11 / 31

Page 12: Scientific Python

SciPy & NumPyNumpy provides a high-performance multidimensional arrayand basic tools to compute with and manipulate these arrays.

SciPy builds on this, and provides a large number of functionsthat operate on numpy arrays and are useful for different typesof scientific and engineering applications.

SciPy provides numerous numerical routines, that run efficientlyon top of NumPy arrays for: optimization, signal processing,linear algebra and many more. It also provides some convenientdata structures as compressed sparse matrix and spatial datastructures. If you had already use some scikits (scikit-learn,scikit-image) you already used scipy extensively.

A few thoughts on SciPy:

Contains linear algebra routines that overlap with NumPy;SciPy’s linear algebra routines always run on theoptimized system libraries (LAPACK, ATLAS, Intel MathKernel Library, etc.)Sparse matrix supportExtends NumPy’s statistical capabilitiesUnder active development, new toys added constantly!

12 / 31

Page 13: Scientific Python

SciPyA big box of tools:

Special functions (scipy.special)Integration (scipy.integrate)Optimization (scipy.optimize)Interpolation (scipy.interpolate)Fourier Transforms (scipy.fftpack)Signal Processing (scipy.signal)Statistics (scipy.stats)Linear Algebra (scipy.linalg)File IO (scipy.io)

Sparse Eigenvalue Problems with ARPACKCompressed Sparse Graph Routines(scipy.sparse.csgraph)Spatial data structures and algorithms (scipy.spatial)Multi-dimensional image processing (scipy.ndimage)Weave (scipy.weave)

from scipy.stats import linregress(slope, intercept, r, p, se) = linregress(x, noisy_y)

# ---

from scipy.stats import spearmanr, pearsonr

x_cubed = x ** 3x_cubed += np.random.normal(0,3,10)

13 / 31

Page 14: Scientific Python

matplotlib

14 / 31

Page 15: Scientific Python

matplotlibThe ultimate plotting library that renders 2D and 3D high-qualityplots for python.

pyplot implements Matlab-style plottingObject-oriented API for more advanced graphicsThe API mimics, in many ways the MATLAB one, easing thetransition from MATLAB users to pythonOnce again, no surprises, matplotlib is a very stable andmature project (expect one major release per year)

Inline plots in the notebook:

ipython notebook --pylab inline

15 / 31

Page 16: Scientific Python

import numpy as npimport matplotlib.pyplot as plt

# Compute the x and y coordinates for points on a sine curvex = np.arange(0, 3 * np.pi, 0.1)y = np.sin(x)

# Plot the points using matplotlibplt.plot(x, y)plt.show() # You must call plt.show() to make graphics appear.

import numpy as npimport matplotlib.pyplot as plt

# Compute the x and y coordinates for points on sine and cosine curvesx = np.arange(0, 3 * np.pi, 0.1)y_sin = np.sin(x)y_cos = np.cos(x)

# Plot the points using matplotlibplt.plot(x, y_sin)plt.plot(x, y_cos)plt.xlabel('x axis label')plt.ylabel('y axis label')plt.title('Sine and Cosine')plt.legend(['Sine', 'Cosine'])plt.show()

matplotlibmatplotlib tries to make easy things easy and hard thingspossible. You can generate plots, histograms, power spectra,bar charts, errorcharts, scatterplots, etc, with just a few lines ofcode.

For simple plotting the pyplot interface provides a MATLAB-likeinterface, particularly when combined with IPython. For thepower user, you have full control of line styles, font properties,axes properties, etc, via an object oriented interface or via a setof functions familiar to MATLAB users.

With just a little bit of extra work we can easily plot a morecomplex chart e.g. multiple lines at once, and add a title,legend, and axis labels.

16 / 31

Page 17: Scientific Python

Notes

17 / 31

Page 18: Scientific Python

TL;DRNumPy is the foundationSciPy is built upon NumPy, with some overlappingfunctionalitymatplotlib complements both

NumPy, SciPy, matplotlibNumPy is the foundation of scientific and numericalcomputing with PythonSciPy is a collection of mathematical and scientific toolsmatplotlib is a technical plotting package

NumPy ArraysImplemented in C for efficiencyPython indexing and slicingElements are strongly typed

Taking advantage of NumPyThink in parallel!Replace loops with vector operations

matplotlibPrimarily 2D plottingBasic 3D plots available with mplot3d (importmpl_toolkits.mplot3d)

18 / 31

Page 19: Scientific Python

Other NotesNumPy/SciPy/scikit-learn rely on many low-level Fortran/Clibrary such as BLAS, ATLAS, the Intel MKL…

most of these libraries are shipped by your favorite OSunoptimized (well, maybe not the case for Mac)you may want to re-compile these libraries or to use apackaged python distribution (anaconda, canopy)libraries for performance: numba, cython, ...

19 / 31

Page 20: Scientific Python

pandas

20 / 31

Page 21: Scientific Python

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis toolsfor the Python programming language.

pandas"R for Python"Provides easy to use data structures & a ton of usefulhelper functions for data cleanup and transformationsFast! (backed by NumPy arrays)Integrates well with other libs e.g. scikit-learn

21 / 31

Page 22: Scientific Python

import pandas as pdimport numpy as npimport matplotlib.pyplot as plt

s = pd.Series([1,3,5,np.nan,6,8])dates = pd.date_range('20130101', periods=6)df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list(df2 = pd.DataFrame({ 'A' : 1., 'B' : pd.Timestamp('20130102'), 'C' : pd.Series(1,index=list(range(4)),dtype= 'D' : np.array([3] * 4,dtype='int32'), 'E' : pd.Categorical(["test","train","test" 'F' : 'foo' })

pandaspandas provides the DataFrame class, which is verysimilar to a data.frame in RBuilt on top of NumPy arrays, and allows mixed columntypesCopes well with missing values (unlike NumPy)Intelligently matches on columns/indices (supports SQL-like joins etc.)Read and write .csv, .xls, HTML tables etc.Lots of useful data analysis tools built in

22 / 31

Page 23: Scientific Python

SymPy

23 / 31

Page 24: Scientific Python

SymPySymPy is a Python library for symbolic mathematics. It aims tobecome a full-featured computer algebra system (CAS) whilekeeping the code as simple as possible in order to becomprehensible and easily extensible.

SymPy is written entirely in Python and does not require anyexternal libraries.

import sympysympy.sqrt(8)# 2*sqrt(2)

from sympy import symbolsx, y = symbols('x y')expr = x + 2*yexpr# x + 2*y

expr - x# 2*y

24 / 31

Page 25: Scientific Python

scikit-learn

25 / 31

Page 26: Scientific Python

scikit-learnMachine Learning algorithms implemented in Python ontop of NumPy & SciPyConveniently maintains the same interface to a widerange of algorithmsIncludes algorithms for: Classification, Regression,Clustering, Dimensionality reductionAs well as lots of useful utilities (cross-validation,preprocessing etc.)

from sklearn import datasetsiris = datasets.load_iris()digits = datasets.load_digits()

print(digits.data)digits.targetdigits.images[0]

from sklearn import svmclf = svm.SVC(gamma=0.001, C=100.)clf.fit(digits.data[:-1], digits.target[:-1])

26 / 31

Page 27: Scientific Python

The State of the Stack

27 / 31

Page 28: Scientific Python

28 / 31

Page 29: Scientific Python

Many More Tools ..PerformanceNumba, Weave, Numexpr, Theano . . .

VisualizationBokeh, Seaborn, Plotly, Chaco, mpld3, ggplot, MayaVi, vincent, toyplot, HoloViews . . .

Data Structures & ComputationBlaze, Dask, DistArray, XRay, Graphlab, SciDBpy, pySpark . . .

Packaging & distribution:pip/wheels, conda, EPD, Canopy, Anaconda ...

29 / 31

Page 30: Scientific Python

References1. Brian Granger: Project Jupyter as a Foundation for Open Data Science2. Juan Luis Cano Rodriguez, IPython: How a notebook is changing science | Python as a real alternative to

MATLAB, Mathematica and other commercial software3. Olivier Hervieu: Introduction to scientific programming in python4. CS231n: IPython Tutorial, http://cs231n.github.io/ipython-tutorial/5. J.R. Johansson: Introduction to scientific computing with Python6. Introduction to solving biological problems with Python by pycam7. Jake VanderPlas: The State of the Stack

30 / 31

Page 31: Scientific Python

ENDEueung Mulyana

http://eueung.github.io/python/sciHint: Navigate with Arrow Keys | Attribution-ShareAlike CC BY-SA

31 / 31