Python at Warp Speed

53
Python at Warp Speed Andreas Schreiber Department for Intelligent and Distributed Systems German Aerospace Center (DLR), Cologne/Berlin > PyCon DE 2016 > Andreas Schreiber Python at Warp Speed > 30.10.2016 DLR.de Chart 1

Transcript of Python at Warp Speed

Page 1: Python at Warp Speed

Python at Warp Speed

Andreas SchreiberDepartment for Intelligent and Distributed SystemsGerman Aerospace Center (DLR), Cologne/Berlin

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 1

Page 2: Python at Warp Speed

Introduction

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 2

Scientist, Head of department

Co-Founder, Data Scientist, and Patient

Communities

Page 3: Python at Warp Speed

Python at Warp Speed

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 3

High-Performance Computing

Distributed Computing

Quantum Computing

Page 4: Python at Warp Speed

Algorithmic ViewInput x – Algorithm A – Output y

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 4

High-Perf. Computing•

Compute

Distributed Computing•

Learn

Quantum Computing•

Optimize

Page 5: Python at Warp Speed

High-Performance Computing

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 5

Page 6: Python at Warp Speed

High raw computing power for large science applications• Huge performance on a single / multi-core processors• Huge machines with up to millions of cores

High-Performance Computing

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 6

Images: DLR & https://sciencenode.org

Page 7: Python at Warp Speed

Sunway TaihuLight10.649.600 Cores, 93 PetaFLOPS

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 7

Page 8: Python at Warp Speed

Tianhe-2 (天河二号 ) 3.120.000 Cores, 33,8 PetaFLOPS

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 8

Page 9: Python at Warp Speed

Titan560.640 Cores, 17,5 PetaFLOPS

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 9

Page 10: Python at Warp Speed

Programming HPC

Technologies• MPI (Message Passing Interface)• OpenMP (Open Multi-Processing)• OpenACC (Open Accelerators)• Global Arrays Toolkit• CUDA (GPGPUs)• OpenCL (GPGPUs)

Languages• Fortran• Fortran• C/C++• Python

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 10

Page 11: Python at Warp Speed

Performance Fortran vs. PythonHelicopter SimulationFortran Code

• Developed 1994-1996• parallelized with MPI• Performance optimization

2013/14 with MPI und OpenACC

Performance Comparison• Multi-core CPUs

• Cython mit OpenMP• Python bindings for Global

Array Toolkit• GPGPUs

• NumbaPro

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 11

Page 12: Python at Warp Speed

Core Computation Loops in Pure Python

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 12

for iblades in range(numberOfBlades): for iradial in range(1, dimensionInRadialDirection): for iazimutal in range(dimensionInAzimualDirectionTotal): for i1 in range(len(vx[0])): for i2 in range(len(vx[0][0])): for i3 in range(len(vx[0][0][0])): # wilin-Aufruf 1for iblades in range(numberOfBlades): for iradial in range(dimensionInRadialDirection): for iazimutal in range(1, dimensionInAzimualDirectionTotal): for i1 in range(len(vx[0])): for i2 in range(len(vx[0][0])): for i3 in range(len(vx[0][0][0])): # wilin-Aufruf 2for iDir in range(3): for i in range(numberOfBlades): for j in range(dimensionInRadialDirection): for k in range(dimensionInAzimualDirectionTotal): x[iDir][i][j][k] = x[iDir][i][j][k] +

dt * vx[iDir][i][j][k]

Page 13: Python at Warp Speed

Performance Fortran vs. PythonSingle Core (Xeon E5645, 6 Cores)

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 13

Fortran Cython Numba Numpy Python0

0.5

1

1.5

2

2.5

3

2.51

1.09

0.270.46

0.01

GFl

ops

Page 14: Python at Warp Speed

Performance Fortran vs. PythonMulti-Core (Xeon E5645, 6 Cores)

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 14

Fortran Cython Global Arrays0

2

4

6

8

10

12

14

16

13.64

5.78

1.38

GFl

ops

Page 15: Python at Warp Speed

Performance Fortran vs. PythonGPGPU (NVIDIA Tesla C2075, 448 CUDA-Cores)

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 15

Fortran Numba0

10

20

30

40

50

60

70

80

69.77

7.79

GFl

ops

Page 16: Python at Warp Speed

Performance-Productivity-Space

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 16

C++ / FORTRAN

Cython

NumPy / Numba

Pure Python

Performance

Prod

uctiv

ity /

Sim

plic

ity

Page 17: Python at Warp Speed

Python’s productivity is great • Allows to write code quickly• Wide range of applications

Python’s performance still needs improvements• Code optimization with tools for profiling, code examination, …• Optimized libraries for parallel processing with MPI etc.

Excited to see advancements by community and companies

Productivity vs. Performance of Python

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 17

Page 18: Python at Warp Speed

Annual scientific workshop, in conjunction with Supercomputing conference

State-of-the-art in• Hybrid programming• Comparison with other languages for HPC• Interactive parallel computing• High-performance computing applications • Performance analysis, profiling, and debugging

PyHPC 2016• 6th edition, Nov 14, 2016, Salt Lake City• http://www.dlr.de/sc/pyhpc2016

Workshop „Python for High-Performance and Scientific Computing“ (PyHPC)

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 18

Page 19: Python at Warp Speed

Tools ExampleIntel® VTune™ Amplifier for Profiling

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 19

https://software.intel.com/en-us/intel-vtune-amplifier-xe/

Page 20: Python at Warp Speed

Tools Examplempi4py with Intel MPI Library and Cython

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 20

from mpi4py import MPIcomm = MPI.COMM_WORLDsize = comm.Get_size()rank = comm.Get_rank()name = MPI.Get_processor_name()if rank == 0: print "Rank %d of %d running on %s"

% (rank, size, name) for i in xrange(1, size): rank, size, name = comm.recv(source=i, tag=1) print “Rank %d of %d running on %s"

% (rank, size, name)else: comm.send((rank, size, name), dest=0, tag=1)

http://pythonhosted.org/mpi4py/https://software.intel.com/en-us/intel-mpi-library

mpi4py

Cython

Intel MPI Library

Page 21: Python at Warp Speed

Distributed Computing

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 21

Page 22: Python at Warp Speed

Distributed Computing

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 22

Paul Baran. On Distributed Communication Networks. IEEE Transactions on Communications, 12(1):1–9, March 1964

Page 23: Python at Warp Speed

Driven by data science, machine learning, predictive analytics, …• Tabular data • Time Series• Stream data• Connected data

Scaling up with increased data size from from laptops to clusters

• Many-Task-Computing• Distributed scheduling• Peer-to-peer data sharing

Distributed Computing

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 23

Page 24: Python at Warp Speed

Space Debris: Object Correlation from Sensor Data and Real-Time Collision Detection

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 24

Page 25: Python at Warp Speed

29,000 Objects Larger than 10 cm

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 25

Page 26: Python at Warp Speed

750,000 Objects Larger than 1 cm

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 26

Page 27: Python at Warp Speed

150M Objects Larger than 1 mm

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 27

Page 28: Python at Warp Speed

Space Debris: Graph of Computations

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 28

Page 29: Python at Warp Speed

Directed Acyclic Graph (DAG)

Python has great tools to execute graphs on distributed resources

Graphs

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 29

AB

E FG D

C

Page 30: Python at Warp Speed

PySpark

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 30

https://spark.apache.org/https://spark.apache.org/docs/0.9.0/python-programming-guide.html

from pyspark import SparkContext

logFile = “myfile.txt“ sc = SparkContext("local", "Simple App")logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: 'a' in s).count()numBs = logData.filter(lambda s: 'b' in s).count()

print("Lines with a: %i, lines with b: %i" % (numAs, numBs))

Page 31: Python at Warp Speed

Dask

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 31

http://dask.pydata.org/

import dask.dataframe as dddf = dd.read_csv('2015-*-*.csv')df.groupby(df.user_id).value.mean().compute()

d = {'x': 1, 'y': (inc, 'x'), 'z': (add, 'y', 10)}

Page 32: Python at Warp Speed

TensorFlow

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 32

import tensorflow as tf

with tf.Session() as sess: with tf.device("/gpu:1"): matrix1 = tf.constant([[3., 3.]]) matrix2 = tf.constant([[2.],[2.]]) product = tf.matmul(matrix1, matrix2)

https://www.tensorflow.org/

Page 33: Python at Warp Speed

Quantum Computing

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 33

Page 34: Python at Warp Speed

Design optimization and robust design• Space systems and air transportation systems• Design and evaluation of systems with consideration of

uncertainties

Optimization Problems in Aeronautics & Space

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 34

Page 35: Python at Warp Speed

Machine Learning• Deep learning, pattern recognition, clustering, images recognition,

stream reasoning

Anomaly detection • Monitoring of space systems

Mission planning• Optimization in relation to time, resource allocation, energy

consumption, costs etc.

Verification and validation• Software, embedded systems

Optimization Problems in Aeronautics & Space

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 35

Page 36: Python at Warp Speed

Discrete optimization is basis for many kinds of problems

® Packaging

® Partitioning

® Mapping

® Scheduling

Hope: Quantum Computers solve those problems faster than classical computers

New Research Field: Quantum Computing

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 36

NP-hard problems!

Page 37: Python at Warp Speed

Bits and Qubits

Classical Bits Quantum bits (Qubits)

• “0” or “1”

• Electric voltage

• Superposition of complex base states

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 37

1

0

Page 38: Python at Warp Speed

Quantum Computer

To date, one Quantum Computer is available commercially

• D-Wave Systems, Inc., Canada

• System with ~2000 Qubits („D-Wave 2X“)

• Adiabatic QC

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 38

Images: © D-Wave Systems, Inc.

Page 39: Python at Warp Speed

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 39

Image: © D-Wave Systems, Inc.

Page 40: Python at Warp Speed

Inside D-Wave 2

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 40

Images: © D-Wave Systems, Inc.

Page 41: Python at Warp Speed

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 41

Image: © D-Wave Systems, Inc.

Page 42: Python at Warp Speed

D-Wave Qubit Topology – Chimera Graph

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 42

Page 43: Python at Warp Speed

Rewrite the problem as discrete optimization problem • QUBO: Quadratic unconstrained binary optimization

“Programming” a Quantum Computer – Step 1

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 43

𝐸ሺ𝑞1,…,𝑞𝑛ሻ= 𝑔𝑖𝑛

𝑖=1 𝑞𝑖 + 𝑠𝑖𝑗𝑞𝑖𝑞𝑗𝑛

𝑖,𝑗=1𝑖>𝑗

Page 44: Python at Warp Speed

Mapping to hardwaretopology

Chimera-QUBO

“Programming” a Quantum Computer – Step 2

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 44

Page 45: Python at Warp Speed

Bringing the problem to the QC• Copy weights and coupling

strengths to physical Qubits

“Programming” a Quantum Computer – Step 3

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 45

Page 46: Python at Warp Speed

Starting the actual “computation”:• Adiabatic process from start energy level to target energy level,

which represents solution of the optimization problem• Result are the voltages after this process

“Programming” a Quantum Computer – Step 4

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 46

Ener

gy

Start System End System

Time

Energy Level

Adiabatic Change

Definesruntime

Page 47: Python at Warp Speed

D-Wave Software Environment

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 47

Page 48: Python at Warp Speed

Programming with PythonImport API and Connect to Machine

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 48

import dwave_sapi2.remote as remoteimport dwave_sapi2.embedding as embeddingimport dwave_sapi2.util as utilimport dwave_sapi2.core as core

# print "Connect to DWave machine ...”# create a remote connectiontry: conn = remote.RemoteConnection(myToken.myUrl, myToken.myToken) # get the solver solver = conn.get_solver('C12')except: print 'Unable to establish connection' exit(1)

Page 49: Python at Warp Speed

Programming with PythonPrepare the Problem (“Embedding”)

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 49

hwa = get_hardware_adjacency(solver)

embeddings[eIndex] = embedding.find_embedding(J, hwa)

h_embedded[eIndex], j0, jc, new_embed = embedding.embed_problem(h, J, embeddings[eIndex], hwa)J_embedded[eIndex] = jcJ_embedded[eIndex].update(j0)embeddings[eIndex] = new_embed

Page 50: Python at Warp Speed

Programming with PythonSolve the Problem (“Ising”)

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 50

getEmbeddedIsing(eIndex)# print "Annealing ...”result = core.solve_ising(solver, h_embedded[eIndex], J_embedded[eIndex], annealing_time=20, num_reads=1000)

unembedded_result = embedding.unembed_answer( result['solutions'], embeddings[eIndex], 'minimize_energy', h, J)

# take the lowest energy solutionrawsolution_phys = (np.array(result['solutions']) + 1)/2rawsolution_log = (np.array(unembedded_result) + 1)/2

Page 51: Python at Warp Speed

Result Distribution

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 51

Page 52: Python at Warp Speed

Summary

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 52

High Performance Data Science Future

Architectures

Python is or will become standard in programming for...

Page 53: Python at Warp Speed

> PyCon DE 2016 > Andreas Schreiber • Python at Warp Speed > 30.10.2016DLR.de • Chart 53

Thank You!

Questions?

[email protected]/sc | @onyame