Granger Parallel IPython

download Granger Parallel IPython

of 25

Transcript of Granger Parallel IPython

  • 8/8/2019 Granger Parallel IPython

    1/25

  • 8/8/2019 Granger Parallel IPython

    2/25

    HardwareCheap, fast and widely available.Our free lunch is over -> Single CPUs arent getting much faster.Transition to multi-CPU and multi-core CPU based machines.Clusters and grids.

    SoftwareSoftware development is labor intensive.Development of parallel codes is very labor intensive.

    Parallel programming tools and paradigms have not evolved muchin the last 2 decades.

  • 8/8/2019 Granger Parallel IPython

    3/25

    Complex algorithms

    Lots of legacy code still used (BLAS,LAPACK, your own)

    Need for high-performance

    The code is always changing

    Large amounts of data

    Scientists love MATLAB, IDL,Mathematica

    Collaborative development/execution

  • 8/8/2019 Granger Parallel IPython

    4/25

  • 8/8/2019 Granger Parallel IPython

    5/25

    1. It is open source and accessible to everyone.

    2. Can be used interactively (like MATLAB, Mathematica, IDL, etc.)

    3. Simple, expressive syntax that is readable by human beings.

    4. Powerful enough to use in large, complex applications.

    5. Supports functional, object-oriented, generic and meta programming.

    6. Extremely robust garbage collection.

    7. Powerful built-in data-types and libraries.

    8. Excellent tools for wrapping Fortran/C/C++/ObjC code ( SWIG , F2PY ,Pyrex , Boost , Weave , PyObjC ).

    9. High quality external libraries for visualization ( MayaVi ), plotting( matplotlib ), numerical/scientic computing ( NumPy / SciPy ),networking ( Twisted ), etc.

    10. Python bindings for major GUI toolkits ( wx, Tk , GTK , Qt ).

    11. Cross platform.

  • 8/8/2019 Granger Parallel IPython

    6/25

    IPython is an enhanced interactive Python shell

    It is the de facto shell for scientic computing inPython.

    Already comes with every major Linuxdistributions.

    Capabilities:Extensible syntax

    GUI integration (wx, Qt, GTK, etc.)

    Seamless system shell access

    Object/namespace introspectionCommand history/recall

    Session logging

    Embeddable

    http://ipython.scipy.org

  • 8/8/2019 Granger Parallel IPython

    7/25

    Pros:Robust, optimized, standardized, portable, commonExisting parallel libraries (FFTW, BLACS, ScaLAPACK, ...)Runs over Ethernet, Inniband, Myrinet.

    Cons:Trivial things are not trivial -> lots of boilerplate code.Orthogonal to how scientists think and work.Load balancing and fault tolerance are difcult to implement (evenfor simple cases).Emphasis on compiled languages (C/C++/Fortran).Non-interactive and non-collaborative.Difcult to integrate into other computing environments (GUIs,

    visualization and plotting tools, Web based tools, etc.).

    Labor intensive compile/execute/debug cycles.

  • 8/8/2019 Granger Parallel IPython

    8/25Kernel = Network aware Python Instance

    Python

    - Objects- Commands

  • 8/8/2019 Granger Parallel IPython

    9/25

    Python instance that listens on a network portMulti-threaded or multi-process with a execution queueUses Twisted -> asynchronous, non-blocking socketsMulti-protocol aware

    Custom control protocolSSH, HTTP, . . .

    Can be started at any time using SSH , Xgrid , PBS,GridEngine, Condor, . . .

    Built-in GUI Integration ( wx , Qt, Tk, GTK, Cocoa, . . .)Pass Python objects, commands, modules, I/O, . . .

    Auto-discovery using Bonjour/ZeroConf

  • 8/8/2019 Granger Parallel IPython

    10/25

    Lightweight object oriented user interface in regular Python Additional syntax in IPython (enhanced Interactive Python)

    Medium level of abstraction

    Higher level than MPIDoesnt assume a particular high-level model

    Automatic synchronization of kernels (no barrier() calls)

    Non-blocking and blocking modes

    Clean handling of remote I/O

    Users process can be transient/kernels are persistent

  • 8/8/2019 Granger Parallel IPython

    11/25

    Needed if system is used on an open network.

    Start Kernels as user nobody

    Firewall all but a few Gateway Kernels

    Gateway Kernels can have SSL enabled forencrypted communications.

    Authenticate users

    Twisted has SSL/Authentication capabilities built-in.

  • 8/8/2019 Granger Parallel IPython

    12/25

    Multiple users can connect simultaneously

    Kernels started dynamically at any time

  • 8/8/2019 Granger Parallel IPython

    13/25

  • 8/8/2019 Granger Parallel IPython

    14/25

  • 8/8/2019 Granger Parallel IPython

    15/25

    It is annoying to type ic.execute(...)

    Use IPythons magic command system. Extended syntax!%cmd args --> magic_cmd(args)

    ic.block=True/False toggles I/O forwarding

  • 8/8/2019 Granger Parallel IPython

    16/25

    push(): one way send to a kernel

    pull(): one way recv from a kernel

    Graceful error handling:

  • 8/8/2019 Granger Parallel IPython

    17/25

    Again, it is annoying to type ic.push() and ic.pull()

    Can also scatter lists/arrays

  • 8/8/2019 Granger Parallel IPython

    18/25

    Scatters the list/array to the kernelsEach kernel calls the function on the elements of the arrayResults are gathered back to the local processTook 13 lines of code to implement.

    Parallel functions: instant trivial parallelization

  • 8/8/2019 Granger Parallel IPython

    19/25

    Distributed Memory ObjectsData parallel computations

    Task SystemsDynamically load balanced task systemFault tolerantCould allow tasks to be tightly coupled

    Googles MapReduceMapReduce is a high-level programming model for processing andgenerating large data set on large clusters. Inspired by LISPs mapand reduce.

    Interactive implementation is possible.

    GOAL: Make it easy to implement high level constructs

  • 8/8/2019 Granger Parallel IPython

    20/25

    In the middle of a parallel calculation, you can write a newPython module and load it into the running kernels

    Can also reload() modied modules.

    Can use to x bugs during a calculation

    Test new algorithms without restarting

  • 8/8/2019 Granger Parallel IPython

    21/25

    Multiple users can connect to a cluster simultaneously.

    Shared namespace and data, common execution queue

    Basic chat facilitySeparation of control and monitoring of kernels

    Some users can monitor the kernels

    Others can control them

    Arbitrary congurations allowed

  • 8/8/2019 Granger Parallel IPython

    22/25

    MPI is great at this, so lets use itNot needed in many cases -> MPI is optional

    Start kernels with mpiexec and call MPI_Init()

    Could wrap other MPI-based libraries.User can directly make calls to MPI through Pythonbindings.

    A high level move() function:

  • 8/8/2019 Granger Parallel IPython

    23/25

    Collaborative visualization/plotting/GUI control

    Other network interfaces (web, ssh)

    Notebook-like frontend (like Mathematica)

    Integration into other cluster environments (PBS,Condor, GridEngine, Globus)

    Scalability + Performance

    SecurityFull MPI integration

    Other high-level parallel constructs

  • 8/8/2019 Granger Parallel IPython

    24/25

    The system is open source (BSD) and is part of the IPython project:

    http://ipython.scipy.org

    IPython is the de factoshell for interactive scientic computingin Python and comes with every major Linux distribution.

    The kernel will become the foundation of a new version of IPython.

    The working prototype is publicly available on the IPythonsubversion repository:

    svn co http://ipython.scipy.org/svn/ipython/ipython/branches/chainsaw ipython1

  • 8/8/2019 Granger Parallel IPython

    25/25

    Python is a useful tool in scienticcomputation.

    The future of parallel computing isinteractive and collaborative .

    Scientists want free, open source andextensible tools.

    We dont have to give up the tools (Fortran/ C/C++/MPI) we love.

    Lots of work remains.