Effective Numerical Computation in NumPy and SciPy
-
Upload
kimikazu-kato -
Category
Technology
-
view
1.801 -
download
0
description
Transcript of Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
Kimikazu Kato
PyCon JP 2014
September 13, 2014
1 / 35
About Myself
Kimikazu KatoChief Scientists at Silver Egg Technology Co., Ltd.
Ph.D in Computer Science
Background in Mathematics, Numerical Computation, Algorithms, etc.
<2 year experience in Python>10 year experience in numerical computation
Now designing algorithms for recommendation system, and doing researchabout machine learning and data analysis.
2 / 35
This talk...
is about effective usage of NumPy/SciPyis NOT exhaustive introduction of capabilities, but shows some casestudies based on my experience and interest
3 / 35
Table of Contents
IntroductionBasics about NumPy
BroadcastingIndexing
Sparse matrixUsage of scipy.sparseInternal structure
Case studiesConclusion
4 / 35
Numerical Computation
Differential equationsSimulationsSignal processingMachine Learningetc...
Why Numerical Computation in Python?
ProductivityEasy to writeEasy to debug
Connectivity with visualization toolsMatplotlibIPython
Connectivity with web systemMany frameworks (Django, Pyramid, Flask, Bottle, etc.)
5 / 35
But Python is Very Slow!
Code in C
#include <stdio.h>int main() { int i; double s=0; for (i=1; i<=100000000; i++) s+=i; printf("%.0f\n",s);}
Code in Python
s=0.for i in xrange(1,100000001): s+=iprint s
Both of the codes compute the sum of integers from 1 to 100,000,000.
Result of benchmark in a certain environment:Above: 0.109 sec (compiled with -O3 option)Below: 8.657 sec(80+ times slower!!)
6 / 35
Better code
import numpy as npa=np.arange(1,100000001)print a.sum()
Now it takes 0.188 sec. (Measured by "time" command in Linux, loading timeincluded)
Still slower than C, but sufficiently fast as a script language.
7 / 35
Lessons
Python is very slow when written badlyTranslate C (or Java, C# etc.) code into Python is often a bad idea.Python-friendly rewriting sometimes result in drastic performanceimprovement
8 / 35
Basic rules for better performance
Avoid for-sentence as far as possibleUtilize libraries' capabilities insteadForget about the cost of copying memory
Typical C programmer might care about it, but ...
9 / 35
Basic techniques for NumPy
BroadcastingIndexing
10 / 35
Broadcasting
>>> import numpy as np>>> a=np.array([0,1,2])>>> a*3array([0, 3, 6])
>>> b=np.array([1,4,9])>>> np.sqrt(b)array([ 1., 2., 3.])
A function which is applied to each element when applied to an array is calleda universal function.
11 / 35
Broadcasting (2D)
>>> import numpy as np>>> a=np.arange(9).reshape((3,3))>>> b=np.array([1,2,3])>>> aarray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])>>> barray([1, 2, 3])>>> a*barray([[ 0, 2, 6], [ 3, 8, 15], [ 6, 14, 24]])
12 / 35
Indexing
>>> import numpy as np>>> a=np.arange(10)>>> aarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>> indices=np.arange(0,10,2)>>> indicesarray([0, 2, 4, 6, 8])>>> a[indices]=0>>> aarray([0, 1, 0, 3, 0, 5, 0, 7, 0, 9])>>> b=np.arange(100,600,100)>>> barray([100, 200, 300, 400, 500])>>> a[indices]=b>>> aarray([100, 1, 200, 3, 300, 5, 400, 7, 500, 9])
13 / 35
Refernces
Gabriele Lanaro, "Python High Performance Programming," PacktPublishing, 2013.Stéfan van der Walt, Numpy Medkit
14 / 35
Sparse matrix
Defined as a matrix in which most elements are zeroCompressed data structure is used to express it, so that it will be...
Space effectiveTime effective
15 / 35
scipy.sparse
The class scipy.sparse has mainly three types as expressions of a sparsematrix. (There are other types but not mentioned here)
lil_matrix : convenient to set data; setting a[i,j] is fastcsr_matrix : convenient for computation, fast to retrieve a rowcsc_matrix : convenient for computation, fast to retrieve a column
Usually, set the data into lil_matrix, and then, convert it to csc_matrix orcsr_matrix.
For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast,but you should avoid calculation of different types.
16 / 35
Use case
>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,0]=1.; a[0,2]=2.>>> a=a.tocsr()>>> print a (0, 0) 1.0 (0, 2) 2.0>>> a.todense()matrix([[ 1., 0., 2.], [ 0., 0., 0.], [ 0., 0., 0.]])>>> b=lil_matrix((3,3))>>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5.>>> b=b.tocsr()>>> b.todense()matrix([[ 0., 0., 0.], [ 0., 3., 0.], [ 4., 0., 5.]])>>> c=a.dot(b)>>> c.todense()matrix([[ 8., 0., 10.], [ 0., 0., 0.], [ 0., 0., 0.]])>>> d=a+b>>> d.todense()matrix([[ 1., 0., 2.], [ 0., 3., 0.], [ 4., 0., 5.]]) 17 / 35
Internal structure: csr_matrix
>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.>>> b=a.tocsr()>>> b.todense()matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]])>>> b.indicesarray([1, 2, 2, 0, 1], dtype=int32)>>> b.dataarray([ 1., 2., 3., 4., 5.])>>> b.indptrarray([0, 2, 3, 5], dtype=int32)
18 / 35
Internal structure: csc_matrix
>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.>>> b=a.tocsc()>>> b.todense()matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]])>>> b.indicesarray([2, 0, 2, 0, 1], dtype=int32)>>> b.dataarray([ 4., 1., 5., 2., 3.])>>> b.indptrarray([0, 1, 3, 5], dtype=int32)
19 / 35
Merit of knowing the internal structure
Setting csr_matrix or csc_matrix with its internal structure is much faster thansetting lil_matrix with indices.
See the benchmark of setting
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜
2 12 1
⋱ ⋱
⋱ 12
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟
20 / 35
from scipy.sparse import lil_matrix, csr_matriximport numpy as npfrom timeit import timeit
def set_lil(n): a=lil_matrix((n,n)) for i in xrange(n): a[i,i]=2. if i+1<n: a[i,i+1]=1. return a
def set_csr(n): data=np.empty(2*n-1) indices=np.empty(2*n-1,dtype=np.int32) indptr=np.empty(n+1,dtype=np.int32) # to be fair, for-sentence is intentionally used # (using indexing technique is faster) for i in xrange(n): indices[2*i]=i data[2*i]=2. if i<n-1: indices[2*i+1]=i+1 data[2*i+1]=1. indptr[i]=2*i indptr[n]=2*n-1 a=csr_matrix((data,indices,indptr),shape=(n,n)) return a
print "lil:",timeit("set_lil(10000)", number=10,setup="from __main__ import set_lil")print "csr:",timeit("set_csr(10000)", number=10,setup="from __main__ import set_csr")
21 / 35
Result:
lil: 11.6730761528csr: 0.0562081336975
Remark
When you deal with already sorted data, setting csr_matrix or csc_matrixwith data, indices, indptr is much faster than setting lil_matrixBut the code tend to be more complicated if you use the internal structureof csr_matrix or csc_matrix
22 / 35
Case Studies
23 / 35
Case 1: Norms
If is dense:
norm=np.dot(v,v)
Expressed as product of matrices. (dot means matrix product, but you don'thave to take transpose explicitly.)
When is sparse, suppose that is expressed as matrix:
norm=v.multiply(v).sum()
(multiply() is element-wise product)
This is because taking transpose of a sparse matrix changes the type.
∥v =∥2 ∑i
v2i
v
v v 1 × n
24 / 35
Frobenius norm:
norm=a.multiply(a).sum()
=∥A∥Fro ∑ij
a2ij
25 / 35
Case 2: Applying a function to all of the elements of asparse matrix
A universal function can be applied to a dense matrix:
>>> import numpy as np>>> a=np.arange(9).reshape((3,3))>>> aarray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])>>> np.tanh(a)array([[ 0. , 0.76159416, 0.96402758], [ 0.99505475, 0.9993293 , 0.9999092 ], [ 0.99998771, 0.99999834, 0.99999977]])
This is convenient and fast.
However, we cannot do the same thing for a sparse matrix.
26 / 35
>>> from scipy.sparse import lil_matrix>>> a=lil_matrix((3,3))>>> a[0,0]=1.>>> a[1,0]=2.>>> b=a.tocsr()>>> np.tanh(b)<3x3 sparse matrix of type '<type 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>
This is because, for an arbitrary function, its application to a sparse matrix isnot necessarily sparse.
However, if a universal function satisfies , the density ispreserved.
Then, how can we compute it?
f f(0) = 0
27 / 35
Use the internal structure!!
The positions of the non-zero elements are not changed after application ofthe function.
Keep indices and indptr, and just change data.
Solution:
b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape)
28 / 35
Case 3: Formula which appears in a paper
In the algorithm for recommendation system [1], the following formulaappears:
where is dense matrix, and D is a diagonal matrix defined from agiven array as:
Here, (which corresponds to the number of users or items) is big and (which means the number of latent factors) is small.
[1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM,2008.
⋅ D ⋅ AAT
A n × f( )di
D =
⎛
⎝⎜⎜⎜⎜⎜
d1
d2
⋱dn
⎞
⎠⎟⎟⎟⎟⎟
n f
29 / 35
Solution 1:
There is a special class dia_matrix to deal with a diagonal sparse matrix.
import scipy.sparse as sparseimport numpy as np
def f(a,d): """a: 2d array of shape (n,f), d: 1d array of length n""" dd=sparse.diags([d],[0]) return np.dot(a.T,dd.dot(a))
30 / 35
Solution 2:
Pack csr_matrix with data,indices,indptr
data=dindices=[0,1,..,n]indptr=[0,1,...,n+1]
def g(a,d): n,f=a.shape data=d indices=np.arange(n) indptr=np.arange(n+1) dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) return np.dot(a.T,dd.dot(a))
31 / 35
Solution 3:
This is equivalent to the broadcasting!
def h(a,d): return np.dot(a.T*d,a)
( D)A = × × AAT
⎛
⎝⎜⎜⎜⎜
a11
a12
⋮a1m
a21
a22
⋮a2m
⋯⋯
⋯
an1
an2
⋮anm
⎞
⎠⎟⎟⎟⎟
⎛
⎝⎜⎜⎜⎜⎜
d1
d2
⋱dn
⎞
⎠⎟⎟⎟⎟⎟
= × A
⎛
⎝⎜⎜⎜⎜
a11d1
a12d1
⋮a1md1
a21d2
a22d2
⋮a2md2
⋯⋯
⋯
an1dn
an2dn
⋮anmdn
⎞
⎠⎟⎟⎟⎟
32 / 35
Benchmark
def datagen(n,f): np.random.seed(0) a=np.random.random((n,f)) d=np.random.random(n) return a,d
from timeit import timeitprint "dia_matrix :",timeit("f(a,d)",number=10, setup="from __main__ import f,datagen; a,d=datagen(1000000,10)")print "csr_matrix :",timeit("g(a,d)",number=10, setup="from __main__ import g,datagen; a,d=datagen(1000000,10)")print "broadcasting :",timeit("h(a,d)",number=10, setup="from __main__ import h,datagen; a,d=datagen(1000000,10)")
Result:
dia_matrix : 1.60458707809csr_matrix : 1.32580018044broadcasting : 1.30032682419
33 / 35
Conclusion
Try not to use for-sentence, but use libraries' capabilities instead.Knowledge about the internal structure of the sparse matrix is useful toextract further performance.Mathematical derivation is important. The key is to find a mathematicallyequivalent and Python-friendly formula.Computational speed does not necessarily matter. Finding a better code ina short time is valuable. Otherwise, you shouldn't pursue too much.
34 / 35
Acknowledgment
I would like to thank
(@shima__shima)who gave me useful advice in Twitter.
35 / 35