P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.

Post on 11-Jan-2016

218 views 2 download

Transcript of P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.

PPARALLEL ARALLEL PPROCESSING ROCESSING IINSTITUTE ·NSTITUTE · F FUDANUDAN UUNIVERSITYNIVERSITY

1

OutlineOutline

Motivation Design & ImplementationEvaluationFuture work

2

TThe popularity of Javahe popularity of Java

3

20.299%

Java!Java!Architecture neutralSimplified memory managementSecurity and Productivity……

4

Write Once Run Anywhere

How to further improve Java runtime performance?

Our ResearchOur ResearchLeverage the synergy between static and

dynamic optimizationsDynamic environment while leveraging

static benefitsFinding performance opportunities before

runtimeStatic annotation to help runtime

optimization

5

OpencjOpencjIt is our first milestone in the whole projectDevelop based on Open64Takes Java source files or Class files as

inputOutputs executable code for

Linux/IA32&x86-64Compilation process is similar to compiling

C/C++ applications

6

OutlineOutlineMotivationDesign & ImplementationEvaluationFuture work

7

Design Overview of Design Overview of OpencjOpencjMigrate frontend of gcj into Open64

8

Java exception handlingJava exception handlingSimilar to C++ exception, but has some

differences, such as runtime exceptions: a/0, NullPointerException No “catch-all” handler used in C++ “finally” mechanism, makes Java exception more

complex than C++ The key point of Java exception handling is to

record the relationship among try/catch/finally blocks.

9

Devirtualization Devirtualization Easy to reuse code for programmers but hard to

analyze for compilerResolve java virtual function call to promote

indirect call into direct callClass hierarchy analysis and Rapid type analysisDevirtualization is implemented at IPA phaseMany optimizations can benefit from this

transformation In SciMark 2.0 Java benchmark test, it can resolve

all 21 user defined virtual function calls.

10

Synchronization Synchronization eliminationeliminationBased on Escape Analysis

Flow-insensitive & interprocedural analysis

Connection Graph: captures the connectivity relationship among objects and object references.

Easily determine whether an object is local to a thread.

If a synchronized object is local to a thread, the synchronized operation can be removed

11

Building connect graphBuilding connect graphOnly five kinds of statements1. p = new P()

2. p = return_new_P()

3. p = q

4. p = q.f

5. p.f = q

12

Analysis processAnalysis process Intra-procedural analysis

Check every call graph node to find out whether there is a synchronized call in a PU

Set initial escape state of each reference node Inter-procedural analysis Start from main function and traverse the call

graph in depth-first order Pass escape states between caller and callee

13

Example 1Example 1

14

GlobalEscape

OutEscape

GlobalEscape

NoEscape

OutEscape

Example 1Example 1

15

GlobalEscape

NoEscape

GlobalEscape

NoEscape

Example2Example2

16

GlobalEscape

ArgEscape

ArgEscape

NoEscape

GlobalEscape

Example2Example2

17

NoEscape

GlobalEscape

GlobalEscape

NoEscape

Array bounds check eliminationArray bounds check eliminationArray bounds check to guarantee Java type-

safe executionPrevent many useful code optimizations

since array bounds check may raise exceptions

Fully elimination: if the check never failsPartial elimination: whenever possible,

moves bounds check out of loops

18

Example of ABCEExample of ABCE

19

Fully redundant check Fully redundant check eliminationeliminationExample

20

0<=i1<100

jc1

Fully redundant check Fully redundant check eliminationeliminationExample

21

Partial eliminationPartial eliminationAdopting loop

versioning technique to guarantee the exception semantic for Java

Set trigger conditions before and after the optimized loop

22

Example

Partial redundant check Partial redundant check eliminationelimination

23

Checks elimination of Checks elimination of ABCEABCE

24

Total: the total number checks in the test casePRCE: the number of Partial Redundant Check EliminationFRCE: the number of Fully Redundant Check EliminationABCE: FPCE+PRCE28.4% speedup in Scimark2 test, lower than we expected

OutlineOutline

MotivationDesign & ImplementationEvaluationFuture work

25

Performance gap between Performance gap between Java & CJava & C

26

opencj -O3 -IPA -fno-bounds-check opencc -O3 -IPA gcj -O3 -fno-bounds-check -funroll-loops gcc -O3 -funroll-loops

higher is better

Static compilation Static compilation vsvs JIT JIT

27

higher is better

Comparing two Java running modes. Running in JVM Running executable file directly

Static compilation Static compilation vsvs JIT JIT

28

lower is better

JDK 1.6 is best except mpegaudio More analysis work need to do.

OutlineOutline

MotivationDesign & ImplementationEvaluationFuture work

29

Future Trends – for JavaFuture Trends – for JavaWhere is Java headed with its dynamic

optimization framework: Exploring opportunities to achieve performance

parity with native code Online profiling mechanisms and feedback-

directed optimizations becoming mainstream …

30

Java advantagesJava advantagesSeveral studies show that Java could

potentially be faster than C/C++ for some reasons: C/C++ Pointers make optimization difficult It is easier to do memory management in Java

than C/C++ as Java only allocates memory through object instantiation. So Java garbage collectors can achieve better cache coherence

Dynamic compilation of Java can use additional information available at run-time to optimize code more effectively.

31

Future of OpencjFuture of OpencjOpencj will achieve better runtime performance by

using JVM as the execution environment Static annotation with annotation-aware JIT - Runtime IPA

Using just-in-time compiler - Apply more effective optimizations by profiling run-

time information

Using garbage collection - Better performance due to cache coherence

There are three steps in our schedule

32

Framework---step1Framework---step1

33

C/C++/F .java

IPL

IPA

BE (LNO, WOPT)

CG

x86 IA LWHIRL

.class

LIR ACTIONS

JIT Interp

runtimelibrary

WHIRL Reader

Whirl_to_LIR

HIR ACTIONS

Byte Code Reader

FE FE

IR Writer

Existing Module

New Module

C/C++

Framework—step2Framework—step2

34

C/C++/F .java

IPL

RIPA

BE (LNO, WOPT)

CG

x86 IA LWHIRL

.class

LIR ACTIONS

JIT Interp

runtimelibrary

WHIRL Reader

Whirl_to_LIR

HIR ACTIONS

Byte Code Reader

FE FE

IR Writer

Existing Module

New Module

C/C++

RIPA IR

Framework---finalFramework---final

35

C/C++/F .java

IPL

RIPA

BE (LNO, WOPT)

CG

x86 IA LWHIRL

.class

LIR ACTIONS

JIT Interp

runtimelibrary

WHIRL Reader

W to LIR

HIR ACTIONS

Byte Code Reader

FE FE

IR Writer

Existing Module

New Module

C/C++

RIPA IR

HWHIRL

Runtime OPT.

Feedback

DiscussionDiscussionShin is the leader of this projectQ&A

36

PPARALLEL ARALLEL PPROCESSING ROCESSING IINSTITUTE ·NSTITUTE · F FUDANUDAN UUNIVERSITYNIVERSITY

37