Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian...

14
Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible User - Data Sharing Between MPI Tasks

description

What is MPC  A framework for HPC developed by CEA  Target many core supercomputers  On supercompters, memory scalability of runtime become an issue  Example : MPI using one process per core on 128 cores  Require 128 times the communication buffers and whole processes address map

Transcript of Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian...

Page 1: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

Porting processes to threads with MPC instead of forking

Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible User - Data Sharing Between MPI Tasks

Page 2: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

What you do with forkint main(){

DetectorInfo * infos = loadInfo();XCommonData * data = loadCommonData();for (…. < tasks ){

int pid = fork();if (pid == 0){

//do a job by sharing infos & data,return;

} else {//add child to wait list

}}//wait all childs

}

Page 3: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

What is MPC A framework for HPC developed by CEA

Target many core supercomputers

On supercompters, memory scalability of runtime become an issue

Example : MPI using one process per core on 128 cores

Require 128 times the communication buffers and whole processes address map

Page 4: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

What is MPC It provides mostly

MPI thread based runtime OpenMP runtime

And also Patched GCC for some TLS extensions User threads with pthread API A parallel NUMA aware memory allocator

(my PhD. work)

Page 5: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

Why is might interest you Thread based MPI

Provide MPI tasks on top of threads instead of processes

Mecanisms to port non threaded code to this

GCC & ICC patch for automatic privatization.

Extension of TLS (HLS : Hierarchical Local Storage) You can manage sharing with a pragma Can select sharing level (cache, NUMA, code….)

Page 6: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

MPC threads

Page 7: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

Can be used to avoid fork Looks like you use fork to share some constant

memory

Thanks to automatic privatization you can move the processes into threads

HLS permit to manage segment sharing

Also handle changing data (if same changes)

Page 8: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

How to use Compile and link with patched gcc

mpc_cc {YOUR_FILE}

Run with mpcrun wrappermpcrun –p 1 –c 8 –n 8 ./prgm

Use pragma to manage sharing#pragma hls node(var)

Other scopes :#pragma hls cache(var) level(3)

Update content only once per instance :#pragma hls single(var) {…}

Page 9: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.
Page 10: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

More in code

Page 11: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.
Page 12: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.
Page 13: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

Current status which can limit Putting objects in TLS is a research topic

(I know my old team has some efforts on this with Intel, don’t know the status)

MPC provides patch for GCC 4.8

ICC can be used for automatic privatization (-fmpc-privatize) but not HLS

Page 14: Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.

Thanks