OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group...

16
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston

Transcript of OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group...

Page 1: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

OpenMP in a Heterogeneous World

Ayodunni AribukiAdvisor: Dr. Barbara Chapman

HPCTools GroupUniversity of Houston

Page 2: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

2

Top 10 Supercomputers (June 2011)

Page 3: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

3

Why OpenMP• Shared memory parallel programming model

– Extends C, C++. Fortran

• Directives-based– Single code for sequential and parallel version

• Incremental parallelism– Little code modification

• High-level– Leave multithreading details to compiler and runtime

• Widely supported by major compilers– Open64, Intel, GNU, IBM, Microsoft, …– Portable

www.openmp.org

Page 4: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

4

OpenMP Example

#pragma omp parallel{ int i;#pragma omp for for(i=0;i<100;i++){ //do stuff } //do more stuff}

0-2425-49

50-74

75-99

Implicit barrier

More

stuff

More

stuff

More

stuff

More

stuff

Fork

Join

Page 5: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

5

Present/Future Architectures & Challenges they pose

Node 0

Memory

Node 1

Node 2 Node 3

Memory

Memory Memory

accelerator

Memory

Many more CPUS

Location

Heterogeneity

Scalability

Node 0

Memory

Node 1

Node 2 Node 3

Memory

Memory Memory

Page 6: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

6

Heterogeneous Embedded Platform

Page 7: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

7

Heterogeneous High-Performance Systems

Each node has multiple CPU cores, and some of the nodes are equipped with additional computational accelerators, such as

GPUs.

www.olcf.ornl.gov/wp-content/uploads/.../Exascale-ASCR-Analysis.pdf

Page 8: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

8

• Must map data/computations to specific devices

• Usually involves substantial rewrite of code• Verbose code– Move data to/from device x– Launch kernel on device– Wait until y is ready/done

• Portability becomes an issue– Multiple versions of same code– Hard to maintain

Programming Heterogeneous Multicore:Issues

Always hardware-specific!

Page 9: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

9

Programming Models? Today’s Scenario

// Run one OpenMP thread per device per MPI node #pragma omp parallel num_threads(devCount) if (initDevice()) {

// Block and grid dimensions dim3 dimBlock(12,12);kernel<<<1,dimBlock>>>(); cudaThreadExit();

} else {

printf("Device error on %s\n",processor_name);}

MPI_Finalize(); return 0;

}

www.cse.buffalo.edu/faculty/miller/Courses/CSE710/heavner.pdf

Page 10: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

10

OpenMP in the Heterogeneous World• All threads are equal– No vocabulary for heterogeneity, separate device

• All threads must have access to the memory– Distributed memories common in embedded systems– Memories may not be coherent

• Implementations rely on OS and threading libraries– Memory allocation, synchronization e.g. Linux,

Pthreads

Page 11: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

11

Extending OpenMP Example

#pragma omp parallel for target(dsp) for(j=0;i<m;i++) for (i=0;i<n,i++) c(i,j)=a(i,j)+b(i,j)

Main Memor

y

Application data

General Purpose

Processor Cores

HWA

Application data

Device cores

Upload remote

data

Download remote

data

Remote Procedure

call

Page 12: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

12

Heterogeneous OpenMP Solution Stack

OpenMP Application

Directives, Compiler

OpenMP library

Environment

variables

Runtime library

OS/system support for shared memory

OpenMP Parallel Computing Solution Stack

User

laye

r

Pro

g.

layer

Op

en

MP

A

PI

Syste

m layer

Core 1 Core 2 Core n…

MCAPI, MRAPI, MTAPI

• Language extensions

• Efficient code generation

12

• Target Portable Runtime Interface

Page 13: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

13

Summarizing My Research

• OpenMP on heterogeneous architectures– Expressing heterogeneity– Generating efficient code for GPUs/DSPs• Managing memories

– Distributed– Explicitly managed

– Enabling portable implementations

Page 14: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

14

Backup

Page 15: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

15

MCA: Generic Multicore Programming

• Solve portability issue in embedded multicore programming

• Defining and promoting open specifications for– Communication - MCAPI– Resource Management - MRAPI– Task Management - MTAPI

(www.multicore-association.org)

Page 16: OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

16

Heterogeneous Platform: CPU + Nvidia GPU