Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer...

29
Hydra VM: Extracting Parallelization From Legacy Code Using STM Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute and State University

Transcript of Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer...

Page 1: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Hydra VM: Extracting Parallelization From Legacy Code Using STM

Mohamed. M. SaadMohamed A. Mohamedin &Prof. Binoy Ravindran

VT-MENA ProgramElectrical & Computer Engineering DepartmentVirginia Polytechnic Institute and State University

Page 2: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Outline

Motivation & Objectives Background

Transactional Memory Jikes RVM

Program Reconstruction Architecture

Profiler, Builder & Runtime Future Work

Page 3: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Motivation

Why Multicores? Difficult to make single-

core clock frequencies even higher

Deeply pipelined circuits▪ heat problems▪ speed of light problems▪ difficult design and

verification▪ large design teams

necessary▪ server farms need

expensive air-conditioning

Page 4: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Motivation

No fast CPUs any more, just more cores!

Trend is using multi-core & hyper-threading

Page 5: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Motivation

At 2005, Sun Niagara (8 cores with HT run 32 HWT)

At 2010, Supermicro (48-core AMD Opteron). Now, Sun  make boxes with between 128-512

hardware threads (16 HWT/core, 8 cores/CPU) !!

What About Software!!!

Are we ready for this HW ?!

Page 6: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Objective

Many applications are designed to use few threads

Legacy systems were designed to run at a single processor

Multi-threading programming is headache for developers (race situations, concurrent access, …)

HydraVM: Java Virtual Machine Prototype based on Jikes RVM and targets utilizing large number of cores through detecting automatically possible parallel portions of code

Page 7: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Background

Transactional Memory

Jikes RVM (Adaptive Online Architecture)

Page 8: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Atomicity Atomicity: An operation (or set of operations) appears to

the rest of the system to occur instantaneously

Example (Money Transfer):……synchronized {

from = from - amount to = to + amount }

…………

Example (Money Transfer):…………account1.lock()account2.lock()from = from - amount to = to + amount account1.unlock()account2.unlock()…………

account1

account2X

Y

Page 9: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Mutual Locks “Classical Approach”

Drawbacks Deadlock Livelock Starvation Priority Inversion Non-composable Cost of managing the lock Non-scalable on multiprocessors

A B

X

Y

Page 10: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Transactional Memory

Simplifies parallel programming by allowing a group of load and store instructions to execute in an atomic way using additional primitives

Example (Money Transfer):…………START-

TRANSACTIONfrom = from - amount to = to + amount END-TRANSACTION ………… Commit

orRollback & Retry

account1

account2

X

Yaccount1y

account2y

account1x

account2x

Page 11: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Transactional Memory

Each transaction has ReadSet & WriteSet Transactions conflict if have the same variable(s)

at ReadSet / WriteSet Conflict Resolution using Contention Manager

that employs different policies (Aggressive, Polite, Back-Off, Random, …..)

Aborted code undo changes (if required) and retries again

Page 12: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Transactional Memory

Transactions may be nested (multiple levels) Inner transaction share the ReadSet/WriteSet of parent Inner transactions conflicts with each other and with

other higher level transactions Aborting parent transaction forces abort for children Inner transactions changes are visible to parents once

commit successfully, but hidden from outside world till commit of highest level

Page 13: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Transactional Memory

Hardware Transactional MemoryModifications in processors, cache and bus protocolex; unbounded HTM, TCC, ….

Software Transactional MemorySoftware runtime library or the programming language supportMinimal hardware support; CAS, LL/SCex; RSTM, DSTM, ESTM, ..

Hybrid Transactional MemoryExploits HTM support to achieve hardware performance for transactions that do not exceed the HTM’s limitations, and STM otherwiseex; LogTM, HyTM, …

Distributed Transactional MemoryExtends transaction primitives to distributed environment (network of multiple

machines)ex; HyFlow, DecentSTM, GenSTM, …

Page 14: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Jikes RVM

Mature modular open source Java virtual machine designed for research purposes. Unlike most other JVMs it is written in Java!

Adaptive Online System

Page 15: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Program Reconstruction “The Main Idea”

We view program as a set of basic building blocks Each block is a set of instructions Block has single entry and multiple exists Blocks may access the same memory

(variables) It is possible to reconstruct the program from

these blocks by rearranging it differently with some changes to the control instructions.

It is even possible to assign each set of blocks to different thread

Page 16: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Example

int counter = 0; for(int i=0; i<2; i++)      if(Math.random()>0.3)        counter++;      else          counter--; 

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

Page 17: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Example

public class Test{   public static void foo(){     int counter = 0;     for(int i=0; i<12; i++)        if(Math.random()>0.3)           counter++;        else           counter--;   }

   public static void zoo(){        System.out.println("hi");   }

   public static void main(String[] args){        int i=6;        if(i<10)                foo();        else                zoo();   }}

Page 18: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Architecture

Page 19: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Architecture Profiler

Split code into Basic Block

Inject loaded classes with additional instructions to monitor: Program Flow (Which Basic

Blocks are accessed and in what order?)

Memory accessed by each Basic Block

Which Basic Block is doing I/O ?

Page 20: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Architecture Profiler

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 write J write C visit B1 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl read K write K visit B2 14: ifle 23 17: iinc 1, 1 read C write C visit B3 20: goto 26 23: iinc 1, -1 read C write C visit B4 26: iinc 2, 1 read J write J visit B5 29: iload_2 30: bipush 12 visit B6 read J 32: if_icmplt 7 35: return

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

Example:

int C = 0; for(int J=0; J<2; J++)      if(Math.random()>0.3)        C++;      else          C--; 

Page 21: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

ArchitectureRecompilation

Recompile the Java class bytecode into machine-code

Replace and reload class definition at memory

Page 22: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

ArchitectureCode Execution

Running the profiled code

Collecting flow & memory access information and store it at the knowledge repository

Page 23: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Architecture Builder

Analyze knowledge repository information and know: Which Blocks can be

grouped together Which groups of blocks can

be parallelized

Page 24: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Architecture Builder

Program can be represented as a string (each character is a basic block).

Example:

for (Integer i = 0; i < DIMx; i++) { for (Integer j = 0; j < DIMx; j++) {

for (Integer k = 0; k < DIMy; k++) {C[i][j] += A[i][k] * B[k][j];

} }

}

abjbhcfefghcfefghijbhcfefghcfefghijk

ab(jb(hcfefg)2hi)2jk

Page 25: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Architecture Builder

ab(jb(hcfefg)2hi)2k

Externalize common blocks patterns as methods

Generated methods may benested

Reconstruct the program asproducer-consumer pattern Collector

▪ Provides Executor with suitable blocks as Tasks to execute according to flow up-to time

Executor▪ Allocates core threads▪ Assign tasks to threads▪ Requests Collector for more blocks based on program flow, after all threads

complete

Page 26: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Architecture Builder

Problems Threads may conflict when

access the same variables Threads may finish out of

normal order Collector may generate invalid

tasks

Lets represents each Thread as Transaction

When two transactions conflicts abort one that has newer blocks relative to normal execution

Transaction will not commit unless its preceding one in timeline is finished

Transaction timeout if not reachable

Page 27: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

ArchitectureCode Execution – revisit

Collects which transactions conflicts and commit rate

We can refine the constructed program

Builder re-organize generated blocks and recompile the code again

Page 28: Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Ongoing & Future Work

Complete the implementation of HydraVM

Profiling by monitoring memory instead of generating new instructions

Automatically uses of Java NIO to handle I/O operations and generate callbacks to process it

Using thread scheduling techniques instead of TM

Formal verification of reconstructed programs matches desired semantics