MATLAB HPCS Extensions

PERCS Program Review

Presented by: David PaduaUniversity of Illinois at Urbana-Champaign

Contributors

• Gheorghe Almasi - IBM Research

• Calin Cascaval - IBM Research

• Siddhartha Chatterjee - IBM Research

• Basilio Fraguela - University of Illinois

• Jose Moreira - IBM Research

• David Padua - University of Illinois

Objectives

• To develop MATLAB extensions for accessing, prototyping, and implementing scalable parallel algorithms.

• To give programmers of high-end machines access to all the powerful features of MATLAB, as a result.– Array operations / kernels.– Interactive interface. – Rendering.

Uses of the MATLAB Extension

• Interface for parallel libraries users.

• Interface for parallel library developers.

• Input to a “conventional” compiler.

• Input to linear algebra compiler. – A library generator/tuner for parallel machines. – Leverage NSF-ITR project with K. Pingali

(Cornell) and J. DeJong (Illinois).

Design Requirements

• Minimal extension.• A natural extension to MATLAB that is easy to

use. • Extensions for direct control of parallelism and

communication on top of the ability to access parallel library routines. It does not seem it is possible to encapsulate all the important parallelism in library routines.

• Extensions that provide the necessary information and can be automatically and effectively analyzed for compilation and translation.

The Design

• No existing MATLAB extension had the characteristics we needed.

• We designed a data type that we call hierarchically tiled arrays (HTAs). These are arrays whose components could be arrays or other HTAs. Operators on HTAs represent computations or communication.

Approach

• In our approach, the programmer interacts with a copy of MATLAB running on a workstation.

• The workstation controls parallel computation on servers.

Approach (Cont.)

• All conventional MATLAB operations are executed on the workstation.

• The parallel server operates on the HTAs.

• The HTA type is implemented as a MATLAB toolbox. This enables implementation as a language extension and simplifies porting to future versions of MATLAB.

Interpretation and Compilation

• A first implementation based on the MATLAB interpreter has been developed.

• This implementation will be used to improve our understanding of the extensions and will serve as a basis for further development and tuning.

• Interpretation overhead may hinder performance, but parallelism can compensate for the overhead.

• Future work will include the implementation of a compiler for MATLAB and our extensions based on the effective strategies of L. DeRose and G. Almasi.

From: G. Almasi and D. Padua MaJIC: Compiling MATLAB for Speed and Responsiveness. PLDI 2002

Hierarchically Tiled Arrays

• Array tiles are a powerful mechanism to enhance locality in sequential computations and to represent data distribution across parallel systems.

• Several levels of tiling are useful to distribute data across parallel machines with a hierarchical organization and to simultaneously represent both data distribution and memory layout.

• For example, a two-level hierarchy of tiles can be used to represent:– the data distribution on a parallel system and – the memory layout within each component.

Hierarchically Tiled Arrays (Cont.)

• Computation and communication are represented as array operations on HTAs.

• Using array operations for communication and computation raises the level of abstraction and, at the same time, facilitates optimization.

Using HTAs for Locality Enhancement

for I=1:q:n for J=1:q:n for K=1:q:n for i=I:I+q-1 for j=J:J+q-1 for k=K:K+q-1 C(i,j)=C(i,j)+A(i,k)*B(k,j); end end end end endend

Tiled matrix multiplication using conventional arrays

• Here, C{i,j}, A{i,k}, B{k,j} represent submatrices.

• The * operator represents matrix multiplication in MATLAB.

Tiled matrix multiplication using HTAsUsing HTAs for Locality Enhancement

for i=1:m for j=1:m for k=1:m C{i,j}=C{i,j}+A{i,k}*B{k,j}; end endend

A{1,1}B{1,1}

A{1,2}B{2,2}

A{1,3}B{3,3}

A{1,4}B{4,4}

A{2,2}B{2,1}

A{2,3}B{3,2}

A{2,4}B{4,3}

A{2,1}B{1,4}

A{3,3}B{3,1}

A{3,4}B{4,2}

A{3,1}B{1,3}

A{3,2}B{2,4}

A{4,4}B{4,1}

A{4,1}B{1,2}

A{4,2}B{2,3}

A{4,3}B{3,4}

Using HTAs to Represent Data Distribution and Parallelism

Cannon’s Algorithm

A{1,1}B{1,1}

A{1,2}B{2,2}

A{1,3}B{3,3}

A{1,4}B{4,4}

A{2,2}B{2,1}

A{2,3}B{3,2}

A{2,4}B{4,3}

A{2,1}B{1,4}

A{3,3}B{3,1}

A{3,4}B{4,2}

A{3,1}B{1,3}

A{3,2}B{2,4}

A{4,4}B{4,1}

A{4,1}B{1,2}

A{4,2}B{2,3}

A{4,3}B{3,4}

A{1,2}B{1,1}

A{1,3}B{2,2}

A{1,4}B{3,3}

A{1,1}B{4,4}

A{2,3}B{2,1}

A{2,4}B{3,2}

A{2,1}B{4,3}

A{2,2}B{1,4}

A{3,4}B{3,1}

A{3,1}B{4,2}

A{3,2}B{1,3}

A{3,3}B{2,4}

A{4,1}B{4,1}

A{4,2}B{1,2}

A{4,3}B{2,3}

A{4,4}B{3,4}

A{1,2}B{2,1}

A{1,3}B{3,2}

A{1,4}B{4,3}

A{1,1}B{1,4}

A{2,3}B{3,1}

A{2,3}B{4,2}

A{2,4}B{1,3}

A{2,1}B{2,4}

A{3,3}B{4,1}

A{3,4}B{1,2}

A{3,1}B{2,3}

A{3,2}B{3,4}

A{4,4}B{1,1}

A{4,1}B{2,2}

A{4,2}B{3,3}

A{4,3}B{4,4}

C{1:n,1:n} = zeros(p,p); %communication…for k=1:n

C{:,:} = C{:,:}+A{:,:}*B{:,:}; %computation

A{i,1:n} = A{i,[2:n, 1]}; %communication

B{1:n,i} = B{[2:n,1],i}; %communicationend

Cannnon’s Algorithm in MATLAB with HPCS Extensions

Cannnon’s Algorithm in C + MPI

for (km = 0; km < m; km++) { char *chn = "T";

dgemm(chn, chn, lclMxSz, lclMxSz, lclMxSz, 1.0, a, lclMxSz, b, lclMxSz, 1.0, c, lclMxSz);

MPI_Isend(a, lclMxSz * lclMxSz, MPI_DOUBLE, destrow, ROW_SHIFT_TAG, MPI_COMM_WORLD, &requestrow);

MPI_Isend(b, lclMxSz * lclMxSz, MPI_DOUBLE, destcol, COL_SHIFT_TAG, MPI_COMM_WORLD, &requestcol);

MPI_Recv(abuf, lclMxSz * lclMxSz, MPI_DOUBLE, MPI_ANY_SOURCE, ROW_SHIFT_TAG, MPI_COMM_WORLD, &status);

MPI_Recv(bbuf, lclMxSz * lclMxSz, MPI_DOUBLE, MPI_ANY_SOURCE, COL_SHIFT_TAG, MPI_COMM_WORLD, &status);

MPI_Wait(&requestrow, &status); aptr = a; a = abuf; abuf = aptr;

MPI_Wait(&requestcol, &status); bptr = b; b = bbuf; bbuf = bptr; }

Speedups on afour-processor IBM SP-2

Speedups on anine-processor IBM SP-2

Flattening

• Elements of an HTA are referenced using a tile index for each level in the hierarchy followed by an array index. Each tile index tuple is enclosed within {}s and the array index is enclosed within parentheses. – In the matrix multiplication code, C{i,j}(3,4) would represent

element 3,4 of submatrix i,j.

• Alternatively, the tiled array could be accessed as a flat array as shown in the next slide.

• This feature is useful when a global view of the array is needed in the algorithm. It is also useful while transforming a sequential code into parallel form.

Two Ways of Referencing the Elements of an 8 x 8 Array.

C{1,1}(1,1)C(1,1)

C{1,1}(1,2)C(1,2)

C{1,1}(1,3)C(1,3)

C{1,1}(1,4)C(1,4)

C{1,2}(1,1)C(1,5)

C{1,2}(1,2)C(1,6)

C{1,2}(1,3)C(1,7)

C{1,2}(1,4)C(1,8)

C{1,1}(2,1)C(2,1)

C{1,1}(2,2)C(2,2)

C{1,1}(2,3)C(2,3)

C{1,1}(2,4)C(2,4)

C{1,2}(2,1)C(2,5)

C{1,2}(2,2)C(2,6)

C{1,2}(2,3)C(2,7)

C{1,2}(2,4)C(2,8)

C{1,1}(3,1)C(3,1)

C{1,1}(3,2)C(3,2)

C{1,1}(3,3)C(3,3)

C{1,1}(3,4)C(3,4)

C{1,2}(3,1)C(3,5)

C{1,2}(3,2)C(3,6)

C{1,2}(3,3)C(3,7)

C{1,2}(3,4)C(3,8)

C{1,1}(4,1)C(4,1)

C{1,1}(4,2)C(4,2)

C{1,1}(4,3)C(4,3)

C{1,1}(4,4)C(4,4)

C{1,2}(4,1)C(4,5)

C{1,2}(4,2)C(4,6)

C{1,2}(4,3)C(4,7)

C{1,2}(4,4)C(4,8)

C{2,1}(1,1)C(5,1)

C{2,1}(1,2)C(5,2)

C{2,1}(1,3)C(5,3)

C{2,1}(1,4)C(5,4)

C{2,2}(1,1)C(5,5)

C{2,2}(1,2)C(5,6)

C{2,2}(1,3)C(5,7)

C{2,2}(1,4)C(5,8)

C{2,1}(2,1)C(6,1)

C{2,1}(2,2)C(6,2)

C{2,1}(2,3)C(6,3)

C{2,1}(2,4)C(6,4)

C{2,2}(2,1)C(6,5)

C{2,2}(2,2)C(6,6)

C{2,2}(2,3)C(6,7)

C{2,2}(2,4)C(6,8)

C{2,1}(3,1)C(7,1)

C{2,1}(3,2)C(7,2)

C{2,1}(3,3)C(7,3)

C{2,1}(3,4)C(7,4)

C{2,2}(3,1)C(7,5)

C{2,2}(3,2)C(7,6)

C{2,2}(3,3)C(7,7)

C{2,2}(3,4)C(7,8)

C{2,1}(4,1)C(8,1)

C{2,1}(4,2)C(8,2)

C{2,1}(4,3)C(8,3)

C{2,1}(4,4)C(8,4)

C{2,2}(4,1)C(8,5)

C{2,2}(4,2)C(8,6)

C{2,2}(4,3)C(8,7)

C{2,2}(4,4)C(8,8)

Status

• We have completed the implementation of practically all of our initial language extensions (for IBM SP-2 and Linux Clusters).

• Following the toolbox approach has been a challenge, but we have been able to overcome all obstacles.

Conclusions

• We have developed parallel extensions to MATLAB.

• It is possible to write highly readable parallel code for both dense and sparse computations with these extensions.

• The HTA objects and operations have been implemented as a MATLAB toolbox which enabled their implementation as language extensions.

MATLAB HPCS Extensions

Documents

Transcript of MATLAB HPCS Extensions

HPCS languages: Fortress, Chapel, X10achauhan/Teaching/B629/2010-Fall/...Outline • Introduction to the HPCS programme – HPCS stands for High Productivity Computing Systems •

IEEE HPCS 2013 - Comparative Evaluation of Peer-to-Peer Systems Using PeerfactSim.KOM.

Matlab training - Undocumented Matlab

Beiko hpcs

Kalman Graffi - IEEE HPCS 2013 - Comparative Evaluation of P2P Systems Using PeerfactSim.KOM

Math 541 - Numerical Analysis - Lecture Notes MatLab ... fileGolden Ratio and MatLab Fibonacci Numbers Collatz Problem MatLab Programming and Series MatLab Basics MatLab Script MatLab

HPCS Languages : Potential for Scientiﬁc Computing · Managed by UT-Battelle for the U. S. Department of Energy High Productivity Computing Systems (HPCS) Program • DARPA initiated,

About Nor-Tech; building HPCs for CAE and Ansys HPC Integrator

Malkawi Keynote-Speech-Challenges to HPCS (1)

Biomedical Computing Requirements for HPCS

HPCS Presentation

Bonnie Cunningham, MA, PT, HPCS

Nine Mile Point, Units 1 and 2 - Drawing No. 12177, HPCS ... · Title: Nine Mile Point, Units 1 and 2 - Drawing No. 12177, HPCS System One Line Diagram, Sheet 3 of 3. Created Date:

Plugins( Extensions( add1ons) · 2 Extensions(for(privacy(protection • Extensions“ ad1blocking” Blockvisible+ advertisements • Extensions“ tracking1blocking” Blockinvisible+

DARPA's HPCS Program: History, Models, Tools, Languages · DARPA's HPCS Program: History, Models, Tools, Languages ... The DARPA HPCS Language Project ... their software compatible

Performance Analysis of Computer Systems...HPCS Performance Targets 8 HPCC was developed by HPCS to assist in testing new HEC systems Each benchmark focuses on a different part of

HPCS Application Analysis and Assessment

TAM-S Installation and How to Use Manual HPCS

Amazing PPC Tactics PPC Tactics 062811.pdf · Types of Ad Extensions •Sitelinks •Product Extensions •Call Extensions •Location Extensions •Seller Ratings •Alpha’s &

Harry Potter Lexicon Club MEET 17W LEADER: XENO PART 5 OF 12, HPCS STUDY.