Eugene Bourakov and Martti Vallila HeliBlanket LLC, California, USA.
Jussi Enkovaara Martti Louhivuori Python in High ...
Transcript of Jussi Enkovaara Martti Louhivuori Python in High ...
![Page 1: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/1.jpg)
import sys, ostry:
from Bio.PDB import PDBParser__biopython_installed__ = True
except ImportError:__biopython_installed__ = False
__default_bfactor__ = 0.0 # default B-factor__default_occupancy__ = 1.0 # default occupancy level__default_segid__ = '' # empty segment ID
class EOF(Exception):def __init__(self): pass
class FileCrawler:"""Crawl through a file reading back and forth without loadinganything to memory."""def __init__(self, filename):
try:self.__fp__ = open(filename)
except IOError:raise ValueError, "Couldn't open file '%s' for reading." % filename
self.tell = self.__fp__.tellself.seek = self.__fp__.seek
def prevline(self):try:
self.prev()
Python in High-Performance Computing
Jussi EnkovaaraMartti Louhivuori
CSC – IT Center for Science Ltd, FinlandJanuary 29-31, 2018
![Page 2: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/2.jpg)
All material (C) 2018 by the authors.This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License, http://creativecommons.org/licenses/by-nc-sa/3.0/
![Page 3: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/3.jpg)
PRACTICALITIES
Computing servers We will use either the classroom computers or CSC’s computing server Taito for the
exercises. You can use also your laptop if all the necessary Python modules (numpy,
cython, mpi4py etc.) are installed. To log in to Taito, use the provided tnrgXX
username and password, e.g.
% ssh –X [email protected]
For editing files you can use e.g. nano editor:
nano example.py
Also other popular editors (vim, emacs) are available.
Python environment at CSC Python is available on all computing servers (taito, sisu), in order to utilize scientific
computing packages use the module command to load correct Python environment.
Default Python version is 2.7, in this course we will be using Python 3.4, which can be
loaded in Taito as
% module load python-env/3.4.5
In the classroom computers module load is not needed.
Exercise material All the exercise material (skeleton codes, input files, model solutions, etc.) is
available in github at https://github.com/csc-training/hpc-python
The material can be downloaded (cloned) as
% git clone https://github.com/csc-training/hpc-python.git
Each exercise is contained in its own subdirectory, i.e.
numpy/reference-copy
numpy/one-dimensional-arrays
…
![Page 4: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/4.jpg)
General exercise instructions Simple exercises can be carried out directly in the interactive interpreter. For more
complex ones it is recommended to write the program into a .py file. Still, it is useful
to keep an interactive interpreter open for testing!
Some exercises contain references to functions/modules which are not addressed in
actual lectures. In these cases Python's interactive help (and google) are useful, e.g.
In [4]: help(numpy)
It is not necessary to complete all the exercises, instead you may leave some for
further study at home. Also, some Bonus exercises are provided in the end of the
sheet.
Visualisation In some exercises it might be convenient to do visualisations with matplotlib Python
package. Interactive plotting is most convenient with the IPython enhanced
interpreter. For enabling interactive plotting, start IPython with --matplotlib
argument:
% ipython –matplotlib
Simple x-y plots can then be done as:
In [1]: import matplotlib.pyplot as plt
…
In [6]: plt.plot(x,y) # line
In [7]: plt.plot(x,y, ’ro’) # individual points in red
Look matplotlib documentation for additional information for visualisation.
Parallel calculations In class room workstations, one needs to load the MPI environment before using
mpi4py:
% module load mpi/openmpi-x86_64
After that MPI parallel Python programs can be launched with mpirun, e.g. to run
with 4 MPI tasks one issues
% mpirun –np 4 python3 example.py
![Page 5: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/5.jpg)
In Taito one can launch interactive MPI programs with srun:
% srun -n4 python3 hello.py
Note that for real production calculations in Taito one should use batch job scripts,
see https://research.csc.fi/taito-user-guide
![Page 6: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/6.jpg)
Basic array manipulation
1. Reference vs. copy Investigate the behavior of the statements below by looking at the values of the
arrays a and b after assignments:
a = np.arange(5)
b = a
b[2] = -1
b = a[:]
b[1] = -1
b = a.copy()
b[0] = -1
2. One dimensional arrays Start from a Python list containing both integers and floating point values, and
construct then a NumPy array from the list.
Generate a 1D NumPy array containing all numbers from -2.0 to 2.0 with a spacing of
0.2. Use optional start and step arguments of np.arange() function.
Generate another 1D NumPy array containing 11 equally spaced values between 0.5
and 1.5. Extract every second element of the array.
3. Two dimensional arrays and slicing First, create a 4x4 array with arbitrary values, then
a) Extract every element from the second row.
b) Extract every element from the third column.
c) Assign a value of 0.21 to upper left 2x2 subarray.
Next, create a 8x8 array with checkerboard pattern, i.e. alternating zeros and ones:
(
1 0 1 …0 1 0 …1 0 1 …… … … …
)
![Page 7: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/7.jpg)
4. Splitting and combining arrays Continue with the previous 4x4 array
a) Use np.split() function for splitting the array into two new 2x4 arrays.
Reconstruct the original 4x4 array by using np.concatenate().
b) Repeat the above exercise but create now 4x2 subarrays and then combine them.
5. Subdiagonal matrices Create a 6x6 matrix with 1’s above and below the diagonal and zeros otherwise:
(
0 1 0 0 0 01 0 1 0 0 00 1 0 1 0 00 0 1 0 1 00 0 0 1 0 10 0 0 0 1 0)
Use the numpy.eye() –function.
![Page 8: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/8.jpg)
NumPy tools
6. Input and output File xy_coordinates.dat contains a list of (x,y) value pairs. Read the data with
numpy.loadtxt(). Add then 2.5 to all y values and write the new data into a file
using numpy.savetxt().
7. Polynomials Fit a second order polynomial to the data of previous exercises by using
numpy.polyfit(). Construct the values of the polynomial in the interval [-6,6]
with numpy.polyval(). You can use matplotlib for plotting both the original points
and the fitted polynomial.
8. Random numbers Generate a one dimensional 1000 element array of uniformly distributed random
numbers using numpy.random module.
Calculate the mean and standard deviation of the array using numpy.mean() and
numpy.std().
Choose some other random distribution and calculate its mean and standard
deviation.
You can visualize the random distributions with matplotlib’s hist() function.
9. Linear algebra Construct two symmetric 2x2 matrices A and B.
(hint: a symmetric matrix can be constructed easily as Asym = A + AT)
Calculate the matrix product C=A*B using numpy.dot().
Calculate the eigenvalues of matrix C with numpy.linalg.eigvals().
![Page 9: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/9.jpg)
Advanced NumPy
10. Advanced indexing Start with 10x10 array of uniformly distribute random numbers
(np.random.random). Find all the elements larger than 0.5 by using Boolean mask.
Find also the indices of elements larger than 0.5. You can use the above mask and
numpy.nonzero() for finding the indices. Check that the values obtained by using
the index array are the same as with the earlier direct Boolean indexing.
11. Translation with broadcasting File points_circle.dat contains x, y coordinates along a circle. Translate all the
coordinates with some vector e.g. (2.1, 1.1). Plot both the original and translated
points in order to see the effect of translation.
12. Finite-difference Derivatives can be calculated numerically with the finite-difference method as:
Construct 1D Numpy array containing the values of xi in the interval [0,π/2] with
spacing Δx=0.1. Evaluate numerically the derivative of sin in this interval (excluding
the end points) using the above formula. Try to avoid for loops. Compare the result
to function cos in the same interval.
13. Numerical integration A simple method for evaluating integrals numerically is by the middle Riemann sum
![Page 10: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/10.jpg)
with x’i = (xi + xi-1)/2. Use the same interval as in the first exercise and investigate how
much the Riemann sum of sin differs from 1.0. Avoid for loops. Investigate also
how the results changes with the choice of Δx.
14. Temporary arrays Try different NumPy array expressions and investigate how much memory is used in
temporary arrays. You can use the Unix command ‘/usr/bin/time –v ‘ for finding out
the maximum memory usage of program, or utilize the maxmem() function in
demos/memory_usage.py
15. Numexpr Try different array expressions and investigate how much numexpr can speed them
up. Try to vary also the number of threads used by numexpr. IPython and %timeit
magic can be useful in testing.
![Page 11: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/11.jpg)
Performance analysis
16. Using cProfile The file heat_simple.py contains (very inefficient) implementation of the two
dimensional heat equation. Use cProfile for investigating where the time is spent
in the program. You can try to profile also the more efficient model solution of
numpy/heat-equation
![Page 12: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/12.jpg)
Optimising with Cython
17. Type declarations Create a setup.py file in order to speed up the heat equation solver with Cython.
Based on the profile in the previous exercise, insert static type declarations for a
proper locations. Investigate the effect on performance. You can use applications
own timers and/or timeit. Annotated HTML-report with “cython –a …” can be
useful when tuning performance
18. C-functions, fast indexing and directives Continue optimising the code by reducing function call overhead, utilizing fast array
indexing and including compiler directives. When finished with the optimisation,
compare performance to Python/NumPy model solution of bonus_heat.py which
uses array operations. You can play around also with larger input data as provided in
bottle_medium.dat and bottle_large.dat.
![Page 13: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/13.jpg)
Interfacing with libraries
19. Interfacing with C a) The files evolve.h and evolve.c contain a pure C implementation of the single
time step in heat equation. The C implemention can be built into a shared library
with the provided Makefile by executing make command. Use cffi for utilizing
the library function instead of the Python function heat_simple.py . Compare
the performance to Cython implementation
b) Repeat the previous exercise, but use now Cython for interfacing.
20. Interfacing with Fortran The file evolve.f90 contain a pure Fortran implementation of the single time step
in heat equation. Use f2py for creating Python interface, and utilize the Fortran
implementation in heat_simple.py . You may need to insert f2py attributes into
the Fortran file.
![Page 14: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/14.jpg)
Multiprocessing
21. Simple calculation Calculate the square of all numbers 1...10 using a separate Process to calculate the
square of each number and print out the result.
22. Work distribution Read the atom coordinates of the Zika virus (5ire.pdb) using the simple parser
provided in pdb.py (see comments in the file for examples on how to use it). In
center-of-coords.py is a skeleton code you may also use as a starting point.
Calculate the center of coordinates in parallel by dividing the coordinates in smaller
chunks (e.g. 100 coordinates) to separate tasks. After completing the tasks in worker
processes, combine the results in the master process.
If needed, figure out also how to deal with any coordinates left out of the neat
uniform-sized chunks.
a) Use a Pool of Workers to distribute the tasks to worker processes. Hint: if you
return also the weight of each chunk, you can process also non-uniform sized chunks.
OR
b) Use a Queue to distribute the tasks to worker processes. Store the results for each
chunk in a separate queue.
![Page 15: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/15.jpg)
Parallel programming with mpi4py
23. Parallel “Hello World” Create a parallel Python program which prints out the number of processes and the
rank of each process.
24. Simple message exchange a) Write a simple program where two processes send and receive a message
to/from each other using send and recv. The message content is a dictionary
with a key {‘rank’ : myrank} where myrank is the rank of the sending process.
After receiving a message, each process should print out the rank of the process
and the value in the received dictionary.
b) In each process, construct a 100 000 element NumPy array which is initialized to
the rank of process. Send and receive the array using Send and Recv and after
receiving print out the rank of process together with the first element of the
received array. Investigate what happens when reordering the send and receive
calls in one of the processes.
25. Message chain Write a simple program where every processor sends data to the next one. Let
ntasks be the number of the tasks, and myid the rank of the current process. Your
program should work as follows:
Every task with a rank less than ntasks-1 sends a message to task myid+1.
For example, task 0 sends a message to task 1.
The message content is an integer array where each element is initialized to myid.
The message tag is the receiver’s id number.
The sender prints out the number of elements it sends and the tag number.
All tasks with rank ≥ 1 receive messages.
Each receiver prints out their myid, and the first element in the received array.
a) Implement the program described above using Send and Recv.
b) Use Sendrecv instead of Send and Recv.
c) Can the code be simplified using MPI.PROC_NULL?
![Page 16: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/16.jpg)
26. Non-blocking communication Go back to “Message chain” exercise and implement it using non-blocking
communication
27. Collective operations In this exercise we test different routines for collective communication.
a) First, write a program where rank 0 sends and array containing numbers from 0
to 7 to all the other ranks using collective communication.
Next, let us continue with four processes with following data vectors:
Task 0: 0 1 2 3 4 5 6 7
Task 1: 8 9 10 11 12 13 14 15
Task 2: 16 17 18 19 20 21 22 23
Task 3: 24 25 26 27 28 29 30 31
In addition, each task has a receive buffer for eight elements and the values in the
buffer are initialized to -1. Implement a program that sends and receives values from
the data vectors to receive buffers using a single collective communication routine
for each case, so that the receive buffers will have the following values:
b)
Task 0: 0 1 -1 -1 -1 -1 -1 -1
Task 1: 2 3 -1 -1 -1 -1 -1 -1
Task 2: 4 5 -1 -1 -1 -1 -1 -1
Task 3: 6 7 -1 -1 -1 -1 -1 -1 c)
Task 0: -1 -1 -1 -1 -1 -1 -1 -1
Task 1: 0 1 8 9 16 17 24 25
Task 2: -1 -1 -1 -1 -1 -1 -1 -1
Task 3: -1 -1 -1 -1 -1 -1 -1 -1 d)
Task 0: 8 10 12 14 16 18 20 22
Task 1: -1 -1 -1 -1 -1 -1 -1 -1
Task 2: 40 42 44 46 48 50 52 54
Task 3: -1 -1 -1 -1 -1 -1 -1 -1
You can start from scratch or use a skeleton file collectives.py.
Tip: create two
communicators
![Page 17: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/17.jpg)
![Page 18: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/18.jpg)
Bonus exercises
B1. Game of life Game of life is a cellular automaton devised by John Conway in 70's:
http://en.wikipedia.org/wiki/Conway's_Game_of_Life
The game consists of two dimensional orthogonal grid of cells. Cells are in two
possible states, alive or dead. Each cell interacts with its eight neighbours, and at
each time step the following transitions occur:
- Any live cell with fewer than two live neighbours dies, as if caused by
underpopulation
- Any live cell with more than three live neighbours dies, as if by overcrowding
- Any live cell with two or three live neighbours lives on to the next generation
- Any dead cell with exactly three live neighbours becomes a live cell
The initial pattern constitutes the seed of the system, and the system is left to evolve
according to rules. Deaths and births happen simultaneously.
Implement the Game of Life using Numpy. Try first 32x32 square grid and cross-
shaped initial pattern:
Try also other grids and initial patterns (e.g. random pattern).
![Page 19: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/19.jpg)
For visualization you can use Matplotlib:
import matplotlib.pyplot as plt
plt.imshow(array)
B2. Rotation with broadcasting Two dimensional coordinates can be rotated by my multiplying the 2-element vector
of coordinates by 2x2 rotation matrix
R=(cos (𝜃) −sin (𝜃)
sin (𝜃) cos (𝜃))
where 𝜃 is the angle of rotation (in radians). Start from the x-y coordinates in the file
points_circle.dat and rotate them by 90°. Utilize broadcasting for performing the
rotation with a single np.dot call. (Hint: you may need to create additional
dimensions). Plot both the original and rotated points in order to see the effect of
rotation.
B3. Two dimensional heat equation Heat (or diffusion) equation is
where u(x, y, t) is the temperature field that varies in space and time, and α is
thermal diffusivity constant. The two dimensional Laplacian can be discretized with
finite differences as
Given an initial condition (u(t=0) = u0) one can follow the time dependence of the
temperature field with explicit time evolution method:
(Note that algorithm is stable only when )
𝜕𝑢
𝜕𝑡= 𝛼𝛻2𝑢
𝛻2𝑢(𝑖, 𝑗) →𝑢(𝑖 − 1, 𝑗)− 2𝑢(𝑖, 𝑗) + 𝑢(𝑖 + 1,𝑗)
(𝛥𝑥)2
+ 𝑢(𝑖, 𝑗 − 1)− 2𝑢(𝑖, 𝑗) + 𝑢(𝑖, 𝑗 + 1)
(𝛥𝑦)2
𝑢𝑚+1(𝑖, 𝑗) = 𝑢𝑚(𝑖, 𝑗) + ∆𝑡 𝛼 𝛻2𝑢𝑚(𝑖, 𝑗)
∆𝑡 < 1
2 𝛼
(𝛥𝑥𝛥𝑦)2
(𝛥𝑥)2 + (𝛥𝑦)2
![Page 20: Jussi Enkovaara Martti Louhivuori Python in High ...](https://reader030.fdocuments.in/reader030/viewer/2022041219/625133efa9d8b46df047184b/html5/thumbnails/20.jpg)
Implement two dimensional heat equation with NumPy using the initial temperature
field in the file bottle.dat (the file consists of a header and 200 x 200 data array).
As a boundary condition use fixed values as given in the initial field. You can start
from the skeleton code in the file bonus_heat.py
B4. Parallel heat equation Parallelize the whole heat equation program with MPI, by dividing the grid in rows
and assigning one row block to one task. A domain decomposition, that is.
The tasks are able to update the grid independently everywhere else than on the row
boundaries – there the communication of a single row with the nearest neighbor is
needed. This is realized by having additional ghost layers that contain the boundary
data of the neighboring tasks. As the system is aperiodic, the outermost ranks
communicate with single neighbor, and the inner ranks with the two neighbors.
Insert the proper MPI routines into skeleton code heat_mpi.py (search
for“TODO”s).
A schematic representation of decomposition looks like (row wise composition is
obtained just by rotating 90 degrees):
portion grid
Rank #0 Rank #1 Rank #2 Rank #3
Local of the
Each rank has additional columns in
both ends of the local grid
(colored columns).
The first actual column of Rank #2 is
communicated to “rightmost”
ghostlayer of Rank #1.
Remember to update all ghost layers at each iteration.