Jussi Enkovaara Martti Louhivuori Python in High ...

20
import sys, os try: from Bio.PDB import PDBParser __biopython_installed__ = True except ImportError: __biopython_installed__ = False __default_bfactor__ = 0.0 # default B-factor __default_occupancy__ = 1.0 # default occupancy level __default_segid__ = '' # empty segment ID class EOF(Exception): def __init__(self): pass class FileCrawler: """ Crawl through a file reading back and forth without loading anything to memory. """ def __init__(self, filename): try: self.__fp__ = open(filename) except IOError: raise ValueError, "Couldn't open file '%s' for reading." % filename self.tell = self.__fp__.tell self.seek = self.__fp__.seek def prevline(self): try: self.prev() Python in High-Performance Computing Jussi Enkovaara Martti Louhivuori CSC – IT Center for Science Ltd, Finland January 29-31, 2018

Transcript of Jussi Enkovaara Martti Louhivuori Python in High ...

Page 1: Jussi Enkovaara Martti Louhivuori Python in High ...

import sys, ostry:

from Bio.PDB import PDBParser__biopython_installed__ = True

except ImportError:__biopython_installed__ = False

__default_bfactor__ = 0.0 # default B-factor__default_occupancy__ = 1.0 # default occupancy level__default_segid__ = '' # empty segment ID

class EOF(Exception):def __init__(self): pass

class FileCrawler:"""Crawl through a file reading back and forth without loadinganything to memory."""def __init__(self, filename):

try:self.__fp__ = open(filename)

except IOError:raise ValueError, "Couldn't open file '%s' for reading." % filename

self.tell = self.__fp__.tellself.seek = self.__fp__.seek

def prevline(self):try:

self.prev()

Python in High-Performance Computing

Jussi EnkovaaraMartti Louhivuori

CSC – IT Center for Science Ltd, FinlandJanuary 29-31, 2018

Page 2: Jussi Enkovaara Martti Louhivuori Python in High ...

All material (C) 2018 by the authors.This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License, http://creativecommons.org/licenses/by-nc-sa/3.0/

Page 3: Jussi Enkovaara Martti Louhivuori Python in High ...

PRACTICALITIES

Computing servers We will use either the classroom computers or CSC’s computing server Taito for the

exercises. You can use also your laptop if all the necessary Python modules (numpy,

cython, mpi4py etc.) are installed. To log in to Taito, use the provided tnrgXX

username and password, e.g.

% ssh –X [email protected]

For editing files you can use e.g. nano editor:

nano example.py

Also other popular editors (vim, emacs) are available.

Python environment at CSC Python is available on all computing servers (taito, sisu), in order to utilize scientific

computing packages use the module command to load correct Python environment.

Default Python version is 2.7, in this course we will be using Python 3.4, which can be

loaded in Taito as

% module load python-env/3.4.5

In the classroom computers module load is not needed.

Exercise material All the exercise material (skeleton codes, input files, model solutions, etc.) is

available in github at https://github.com/csc-training/hpc-python

The material can be downloaded (cloned) as

% git clone https://github.com/csc-training/hpc-python.git

Each exercise is contained in its own subdirectory, i.e.

numpy/reference-copy

numpy/one-dimensional-arrays

Page 4: Jussi Enkovaara Martti Louhivuori Python in High ...

General exercise instructions Simple exercises can be carried out directly in the interactive interpreter. For more

complex ones it is recommended to write the program into a .py file. Still, it is useful

to keep an interactive interpreter open for testing!

Some exercises contain references to functions/modules which are not addressed in

actual lectures. In these cases Python's interactive help (and google) are useful, e.g.

In [4]: help(numpy)

It is not necessary to complete all the exercises, instead you may leave some for

further study at home. Also, some Bonus exercises are provided in the end of the

sheet.

Visualisation In some exercises it might be convenient to do visualisations with matplotlib Python

package. Interactive plotting is most convenient with the IPython enhanced

interpreter. For enabling interactive plotting, start IPython with --matplotlib

argument:

% ipython –matplotlib

Simple x-y plots can then be done as:

In [1]: import matplotlib.pyplot as plt

In [6]: plt.plot(x,y) # line

In [7]: plt.plot(x,y, ’ro’) # individual points in red

Look matplotlib documentation for additional information for visualisation.

Parallel calculations In class room workstations, one needs to load the MPI environment before using

mpi4py:

% module load mpi/openmpi-x86_64

After that MPI parallel Python programs can be launched with mpirun, e.g. to run

with 4 MPI tasks one issues

% mpirun –np 4 python3 example.py

Page 5: Jussi Enkovaara Martti Louhivuori Python in High ...

In Taito one can launch interactive MPI programs with srun:

% srun -n4 python3 hello.py

Note that for real production calculations in Taito one should use batch job scripts,

see https://research.csc.fi/taito-user-guide

Page 6: Jussi Enkovaara Martti Louhivuori Python in High ...

Basic array manipulation

1. Reference vs. copy Investigate the behavior of the statements below by looking at the values of the

arrays a and b after assignments:

a = np.arange(5)

b = a

b[2] = -1

b = a[:]

b[1] = -1

b = a.copy()

b[0] = -1

2. One dimensional arrays Start from a Python list containing both integers and floating point values, and

construct then a NumPy array from the list.

Generate a 1D NumPy array containing all numbers from -2.0 to 2.0 with a spacing of

0.2. Use optional start and step arguments of np.arange() function.

Generate another 1D NumPy array containing 11 equally spaced values between 0.5

and 1.5. Extract every second element of the array.

3. Two dimensional arrays and slicing First, create a 4x4 array with arbitrary values, then

a) Extract every element from the second row.

b) Extract every element from the third column.

c) Assign a value of 0.21 to upper left 2x2 subarray.

Next, create a 8x8 array with checkerboard pattern, i.e. alternating zeros and ones:

(

1 0 1 …0 1 0 …1 0 1 …… … … …

)

Page 7: Jussi Enkovaara Martti Louhivuori Python in High ...

4. Splitting and combining arrays Continue with the previous 4x4 array

a) Use np.split() function for splitting the array into two new 2x4 arrays.

Reconstruct the original 4x4 array by using np.concatenate().

b) Repeat the above exercise but create now 4x2 subarrays and then combine them.

5. Subdiagonal matrices Create a 6x6 matrix with 1’s above and below the diagonal and zeros otherwise:

(

0 1 0 0 0 01 0 1 0 0 00 1 0 1 0 00 0 1 0 1 00 0 0 1 0 10 0 0 0 1 0)

Use the numpy.eye() –function.

Page 8: Jussi Enkovaara Martti Louhivuori Python in High ...

NumPy tools

6. Input and output File xy_coordinates.dat contains a list of (x,y) value pairs. Read the data with

numpy.loadtxt(). Add then 2.5 to all y values and write the new data into a file

using numpy.savetxt().

7. Polynomials Fit a second order polynomial to the data of previous exercises by using

numpy.polyfit(). Construct the values of the polynomial in the interval [-6,6]

with numpy.polyval(). You can use matplotlib for plotting both the original points

and the fitted polynomial.

8. Random numbers Generate a one dimensional 1000 element array of uniformly distributed random

numbers using numpy.random module.

Calculate the mean and standard deviation of the array using numpy.mean() and

numpy.std().

Choose some other random distribution and calculate its mean and standard

deviation.

You can visualize the random distributions with matplotlib’s hist() function.

9. Linear algebra Construct two symmetric 2x2 matrices A and B.

(hint: a symmetric matrix can be constructed easily as Asym = A + AT)

Calculate the matrix product C=A*B using numpy.dot().

Calculate the eigenvalues of matrix C with numpy.linalg.eigvals().

Page 9: Jussi Enkovaara Martti Louhivuori Python in High ...

Advanced NumPy

10. Advanced indexing Start with 10x10 array of uniformly distribute random numbers

(np.random.random). Find all the elements larger than 0.5 by using Boolean mask.

Find also the indices of elements larger than 0.5. You can use the above mask and

numpy.nonzero() for finding the indices. Check that the values obtained by using

the index array are the same as with the earlier direct Boolean indexing.

11. Translation with broadcasting File points_circle.dat contains x, y coordinates along a circle. Translate all the

coordinates with some vector e.g. (2.1, 1.1). Plot both the original and translated

points in order to see the effect of translation.

12. Finite-difference Derivatives can be calculated numerically with the finite-difference method as:

Construct 1D Numpy array containing the values of xi in the interval [0,π/2] with

spacing Δx=0.1. Evaluate numerically the derivative of sin in this interval (excluding

the end points) using the above formula. Try to avoid for loops. Compare the result

to function cos in the same interval.

13. Numerical integration A simple method for evaluating integrals numerically is by the middle Riemann sum

Page 10: Jussi Enkovaara Martti Louhivuori Python in High ...

with x’i = (xi + xi-1)/2. Use the same interval as in the first exercise and investigate how

much the Riemann sum of sin differs from 1.0. Avoid for loops. Investigate also

how the results changes with the choice of Δx.

14. Temporary arrays Try different NumPy array expressions and investigate how much memory is used in

temporary arrays. You can use the Unix command ‘/usr/bin/time –v ‘ for finding out

the maximum memory usage of program, or utilize the maxmem() function in

demos/memory_usage.py

15. Numexpr Try different array expressions and investigate how much numexpr can speed them

up. Try to vary also the number of threads used by numexpr. IPython and %timeit

magic can be useful in testing.

Page 11: Jussi Enkovaara Martti Louhivuori Python in High ...

Performance analysis

16. Using cProfile The file heat_simple.py contains (very inefficient) implementation of the two

dimensional heat equation. Use cProfile for investigating where the time is spent

in the program. You can try to profile also the more efficient model solution of

numpy/heat-equation

Page 12: Jussi Enkovaara Martti Louhivuori Python in High ...

Optimising with Cython

17. Type declarations Create a setup.py file in order to speed up the heat equation solver with Cython.

Based on the profile in the previous exercise, insert static type declarations for a

proper locations. Investigate the effect on performance. You can use applications

own timers and/or timeit. Annotated HTML-report with “cython –a …” can be

useful when tuning performance

18. C-functions, fast indexing and directives Continue optimising the code by reducing function call overhead, utilizing fast array

indexing and including compiler directives. When finished with the optimisation,

compare performance to Python/NumPy model solution of bonus_heat.py which

uses array operations. You can play around also with larger input data as provided in

bottle_medium.dat and bottle_large.dat.

Page 13: Jussi Enkovaara Martti Louhivuori Python in High ...

Interfacing with libraries

19. Interfacing with C a) The files evolve.h and evolve.c contain a pure C implementation of the single

time step in heat equation. The C implemention can be built into a shared library

with the provided Makefile by executing make command. Use cffi for utilizing

the library function instead of the Python function heat_simple.py . Compare

the performance to Cython implementation

b) Repeat the previous exercise, but use now Cython for interfacing.

20. Interfacing with Fortran The file evolve.f90 contain a pure Fortran implementation of the single time step

in heat equation. Use f2py for creating Python interface, and utilize the Fortran

implementation in heat_simple.py . You may need to insert f2py attributes into

the Fortran file.

Page 14: Jussi Enkovaara Martti Louhivuori Python in High ...

Multiprocessing

21. Simple calculation Calculate the square of all numbers 1...10 using a separate Process to calculate the

square of each number and print out the result.

22. Work distribution Read the atom coordinates of the Zika virus (5ire.pdb) using the simple parser

provided in pdb.py (see comments in the file for examples on how to use it). In

center-of-coords.py is a skeleton code you may also use as a starting point.

Calculate the center of coordinates in parallel by dividing the coordinates in smaller

chunks (e.g. 100 coordinates) to separate tasks. After completing the tasks in worker

processes, combine the results in the master process.

If needed, figure out also how to deal with any coordinates left out of the neat

uniform-sized chunks.

a) Use a Pool of Workers to distribute the tasks to worker processes. Hint: if you

return also the weight of each chunk, you can process also non-uniform sized chunks.

OR

b) Use a Queue to distribute the tasks to worker processes. Store the results for each

chunk in a separate queue.

Page 15: Jussi Enkovaara Martti Louhivuori Python in High ...

Parallel programming with mpi4py

23. Parallel “Hello World” Create a parallel Python program which prints out the number of processes and the

rank of each process.

24. Simple message exchange a) Write a simple program where two processes send and receive a message

to/from each other using send and recv. The message content is a dictionary

with a key {‘rank’ : myrank} where myrank is the rank of the sending process.

After receiving a message, each process should print out the rank of the process

and the value in the received dictionary.

b) In each process, construct a 100 000 element NumPy array which is initialized to

the rank of process. Send and receive the array using Send and Recv and after

receiving print out the rank of process together with the first element of the

received array. Investigate what happens when reordering the send and receive

calls in one of the processes.

25. Message chain Write a simple program where every processor sends data to the next one. Let

ntasks be the number of the tasks, and myid the rank of the current process. Your

program should work as follows:

Every task with a rank less than ntasks-1 sends a message to task myid+1.

For example, task 0 sends a message to task 1.

The message content is an integer array where each element is initialized to myid.

The message tag is the receiver’s id number.

The sender prints out the number of elements it sends and the tag number.

All tasks with rank ≥ 1 receive messages.

Each receiver prints out their myid, and the first element in the received array.

a) Implement the program described above using Send and Recv.

b) Use Sendrecv instead of Send and Recv.

c) Can the code be simplified using MPI.PROC_NULL?

Page 16: Jussi Enkovaara Martti Louhivuori Python in High ...

26. Non-blocking communication Go back to “Message chain” exercise and implement it using non-blocking

communication

27. Collective operations In this exercise we test different routines for collective communication.

a) First, write a program where rank 0 sends and array containing numbers from 0

to 7 to all the other ranks using collective communication.

Next, let us continue with four processes with following data vectors:

Task 0: 0 1 2 3 4 5 6 7

Task 1: 8 9 10 11 12 13 14 15

Task 2: 16 17 18 19 20 21 22 23

Task 3: 24 25 26 27 28 29 30 31

In addition, each task has a receive buffer for eight elements and the values in the

buffer are initialized to -1. Implement a program that sends and receives values from

the data vectors to receive buffers using a single collective communication routine

for each case, so that the receive buffers will have the following values:

b)

Task 0: 0 1 -1 -1 -1 -1 -1 -1

Task 1: 2 3 -1 -1 -1 -1 -1 -1

Task 2: 4 5 -1 -1 -1 -1 -1 -1

Task 3: 6 7 -1 -1 -1 -1 -1 -1 c)

Task 0: -1 -1 -1 -1 -1 -1 -1 -1

Task 1: 0 1 8 9 16 17 24 25

Task 2: -1 -1 -1 -1 -1 -1 -1 -1

Task 3: -1 -1 -1 -1 -1 -1 -1 -1 d)

Task 0: 8 10 12 14 16 18 20 22

Task 1: -1 -1 -1 -1 -1 -1 -1 -1

Task 2: 40 42 44 46 48 50 52 54

Task 3: -1 -1 -1 -1 -1 -1 -1 -1

You can start from scratch or use a skeleton file collectives.py.

Tip: create two

communicators

Page 17: Jussi Enkovaara Martti Louhivuori Python in High ...
Page 18: Jussi Enkovaara Martti Louhivuori Python in High ...

Bonus exercises

B1. Game of life Game of life is a cellular automaton devised by John Conway in 70's:

http://en.wikipedia.org/wiki/Conway's_Game_of_Life

The game consists of two dimensional orthogonal grid of cells. Cells are in two

possible states, alive or dead. Each cell interacts with its eight neighbours, and at

each time step the following transitions occur:

- Any live cell with fewer than two live neighbours dies, as if caused by

underpopulation

- Any live cell with more than three live neighbours dies, as if by overcrowding

- Any live cell with two or three live neighbours lives on to the next generation

- Any dead cell with exactly three live neighbours becomes a live cell

The initial pattern constitutes the seed of the system, and the system is left to evolve

according to rules. Deaths and births happen simultaneously.

Implement the Game of Life using Numpy. Try first 32x32 square grid and cross-

shaped initial pattern:

Try also other grids and initial patterns (e.g. random pattern).

Page 19: Jussi Enkovaara Martti Louhivuori Python in High ...

For visualization you can use Matplotlib:

import matplotlib.pyplot as plt

plt.imshow(array)

B2. Rotation with broadcasting Two dimensional coordinates can be rotated by my multiplying the 2-element vector

of coordinates by 2x2 rotation matrix

R=(cos (𝜃) −sin (𝜃)

sin (𝜃) cos (𝜃))

where 𝜃 is the angle of rotation (in radians). Start from the x-y coordinates in the file

points_circle.dat and rotate them by 90°. Utilize broadcasting for performing the

rotation with a single np.dot call. (Hint: you may need to create additional

dimensions). Plot both the original and rotated points in order to see the effect of

rotation.

B3. Two dimensional heat equation Heat (or diffusion) equation is

where u(x, y, t) is the temperature field that varies in space and time, and α is

thermal diffusivity constant. The two dimensional Laplacian can be discretized with

finite differences as

Given an initial condition (u(t=0) = u0) one can follow the time dependence of the

temperature field with explicit time evolution method:

(Note that algorithm is stable only when )

𝜕𝑢

𝜕𝑡= 𝛼𝛻2𝑢

𝛻2𝑢(𝑖, 𝑗) →𝑢(𝑖 − 1, 𝑗)− 2𝑢(𝑖, 𝑗) + 𝑢(𝑖 + 1,𝑗)

(𝛥𝑥)2

+ 𝑢(𝑖, 𝑗 − 1)− 2𝑢(𝑖, 𝑗) + 𝑢(𝑖, 𝑗 + 1)

(𝛥𝑦)2

𝑢𝑚+1(𝑖, 𝑗) = 𝑢𝑚(𝑖, 𝑗) + ∆𝑡 𝛼 𝛻2𝑢𝑚(𝑖, 𝑗)

∆𝑡 < 1

2 𝛼

(𝛥𝑥𝛥𝑦)2

(𝛥𝑥)2 + (𝛥𝑦)2

Page 20: Jussi Enkovaara Martti Louhivuori Python in High ...

Implement two dimensional heat equation with NumPy using the initial temperature

field in the file bottle.dat (the file consists of a header and 200 x 200 data array).

As a boundary condition use fixed values as given in the initial field. You can start

from the skeleton code in the file bonus_heat.py

B4. Parallel heat equation Parallelize the whole heat equation program with MPI, by dividing the grid in rows

and assigning one row block to one task. A domain decomposition, that is.

The tasks are able to update the grid independently everywhere else than on the row

boundaries – there the communication of a single row with the nearest neighbor is

needed. This is realized by having additional ghost layers that contain the boundary

data of the neighboring tasks. As the system is aperiodic, the outermost ranks

communicate with single neighbor, and the inner ranks with the two neighbors.

Insert the proper MPI routines into skeleton code heat_mpi.py (search

for“TODO”s).

A schematic representation of decomposition looks like (row wise composition is

obtained just by rotating 90 degrees):

portion grid

Rank #0 Rank #1 Rank #2 Rank #3

Local of the

Each rank has additional columns in

both ends of the local grid

(colored columns).

The first actual column of Rank #2 is

communicated to “rightmost”

ghostlayer of Rank #1.

Remember to update all ghost layers at each iteration.