Highly Scalable Java Programming for Multi-Core System

31
Zhi Gan ([email protected] ) http://ganzhi.blogspot.com Highly Scalable Java Programming Highly Scalable Java Programming for Multi for Multi - - Core System Core System

description

This is a list of java programming skill which can be used to improve scalability of Java application.

Transcript of Highly Scalable Java Programming for Multi-Core System

Page 1: Highly Scalable Java Programming for Multi-Core System

Zhi Gan ([email protected])

http://ganzhi.blogspot.com

Highly Scalable Java Programming Highly Scalable Java Programming for Multifor Multi--Core SystemCore System

Page 2: Highly Scalable Java Programming for Multi-Core System

Agenda

• Software Challenges

• Profiling Tools Introduction

• Best Practice for Java Programming

• Rocket Science: Lock-Free Programming

2

Page 3: Highly Scalable Java Programming for Multi-Core System

Software challenges• Parallelism

– Larger threads per system = more parallelism needed to achieve high utilization

– Thread-to-thread affinity (shared code and/or data)

• Memory management– Sharing of cache and memory bandwidth across more threads =

greater need for memory efficiency– Thread-to-memory affinity (execute thread closest to associated

data)

• Storage management– Allocate data across DRAM, Disk & Flash according to access

frequency and patterns

3

Page 4: Highly Scalable Java Programming for Multi-Core System

Typical Scalability Curve

Page 5: Highly Scalable Java Programming for Multi-Core System

The 1st Step: Profiling Parallel Application

Page 6: Highly Scalable Java Programming for Multi-Core System

Important Profiling Tools• Java Lock Monitor (JLM)

– understand the usage of locks in their applications – similar tool: Java Lock Analyzer (JLA)

• Multi-core SDK (MSDK)– in-depth analysis of the complete execution stack

• AIX Performance Tools – Simple Performance Lock Analysis Tool (SPLAT) – XProfiler– prof, tprof and gprof

Page 7: Highly Scalable Java Programming for Multi-Core System

Tprof and VPA tool

Page 8: Highly Scalable Java Programming for Multi-Core System

Java Lock Monitor

• %MISS : 100 * SLOW / NONREC• GETS : Lock Entries• NONREC : Non Recursive Gets• SLOW : Non Recursives that Wait• REC : Recursive Gets• TIER2 : SMP: Total try-enter spin loop cnt (middle for 3

tier)• TIER3 : SMP: Total yield spin loop cnt (outer for 3 tier)• %UTIL : 100 * Hold-Time / Total-Time• AVER-HTM : Hold-Time / NONREC

Page 9: Highly Scalable Java Programming for Multi-Core System

Multi-core SDKDead Lock View

Synchronization View

Page 10: Highly Scalable Java Programming for Multi-Core System

Best Practice for High Scalable Java Programming

Page 11: Highly Scalable Java Programming for Multi-Core System

What Is Lock Contention?

From JLM tool website

Page 12: Highly Scalable Java Programming for Multi-Core System

Lock Operation Itself Is Expensive

• CAS operations are predominantly used for locking

• it takes up a big part of the execution time

Page 13: Highly Scalable Java Programming for Multi-Core System

Reduce Locking Scope

public synchronized void foo1(int k) { String key = Integer.toString(k);String value = key+"value"; if (null == key){

return ; }else {

maph.put(key, value); }

}

Execution Time: 16106 milliseconds

public void foo2(int k) { String key = Integer.toString(k); String value = key+"value"; if (null == key){

return ; }else{

synchronized(this){ maph.put(key, value);

} }

}

Execution Time: 12157 milliseconds

25%

Page 14: Highly Scalable Java Programming for Multi-Core System

Results from JLM report

Reduced AVER_HTM

Page 15: Highly Scalable Java Programming for Multi-Core System

Lock Splittingpublic synchronized void

addUser1(String u) { users.add(u);

}

public synchronized void addQuery1(String q) { queries.add(q);

}

Execution Time: 12981 milliseconds

public void addUser2(String u){ synchronized(users){

users.add(u); }

} public void addQuery2(String q){

synchronized(queries){ queries.add(q);

} }

Execution Time: 4797 milliseconds

64%

Page 16: Highly Scalable Java Programming for Multi-Core System

Result from JLM report

Reduced lock tries

Page 17: Highly Scalable Java Programming for Multi-Core System

Lock Striping

public synchronized void put1(int indx, String k) { share[indx] = k;

}

Execution Time: 5536 milliseconds

public void put2(int indx, String k) { synchronized (locks[indx%N_LOCKS]) {

share[indx] = k; }

}

Execution Time: 1857 milliseconds

66%

Page 18: Highly Scalable Java Programming for Multi-Core System

Result from JLM report

More locks with less AVER_HTM

Page 19: Highly Scalable Java Programming for Multi-Core System

Split Hot Points : Scalable Counter

– ConcurrentHashMap maintains a independent counter for each segment of hash map, and use a lock for each counter

– get global counter by sum all independent counters

Page 20: Highly Scalable Java Programming for Multi-Core System

Alternatives of Exclusive Lock

• Duplicate shared resource if possible• Atomic variables

– counter, sequential number generator, head pointer of linked-list

• Concurrent container– java.util.concurrent package, Amino lib

• Read-Write Lock– java.util.concurrent.locks.ReadWriteLock

Page 21: Highly Scalable Java Programming for Multi-Core System

Example of AtomicLongArraypublic synchronized void set1(int

idx, long val) { d[idx] = val;

}

public synchronized long get1(int idx) { long ret = d[idx]; return ret;

}

Execution Time: 23550 milliseconds

private final AtomicLongArray a;

public void set2(int idx, long val) { a.addAndGet(idx, val);

}

public long get2(int idx) { long ret = a.get(idx); return ret;

}

Execution Time: 842 milliseconds

96%

Page 22: Highly Scalable Java Programming for Multi-Core System

Using Concurrent Container• java.util.concurrent package

– since Java1.5 – ConcurrentHashMap, ConcurrentLinkedQueue,

CopyOnWriteArrayList, etc• Amino Lib is another good choice

– LockFreeList, LockFreeStack, LockFreeQueue, etc• Thread-safe container• Optimized for common operations• High performance and scalability for multi-core

platform• Drawback: without full feature support

Page 23: Highly Scalable Java Programming for Multi-Core System

Using Immutable and Thread Local data

• Immutable data – remain unchanged in its life cycle – always thread-safe

• Thread Local data– only be used by a single thread– not shared among different threads– to replace global waiting queue, object pool– used in work-stealing scheduler

Page 24: Highly Scalable Java Programming for Multi-Core System

Reduce Memory Allocation

• JVM: Two level of memory allocation– firstly from thread-local buffer– then from global buffer

• Thread-local buffer will be exhausted quickly if frequency of allocation is high

• ThreadLocal class may be helpful if temporary object is needed in a loop

Page 25: Highly Scalable Java Programming for Multi-Core System

Rocket Science: Lock-Free Programming

Page 26: Highly Scalable Java Programming for Multi-Core System

Using Lock-Free/Wait-Free Algorithm

• Lock-Free allow concurrent updates of shared data structures without using any locking mechanisms– solves some of the basic problems associated

with using locks in the code– helps create algorithms that show good

scalability • Highly scalable and efficient • Amino Lib

Page 27: Highly Scalable Java Programming for Multi-Core System

Why Lock-Free Often Means Better Scalability? (I)

Lock:All threads wait for oneLock free: No wait, but only one can succeed,

Other threads need retry

Page 28: Highly Scalable Java Programming for Multi-Core System

Why Lock-Free Often Means Better Scalability? (II)

Lock:All threads wait for oneLock free: No wait, but only one can succeed,

Other threads often need to retry

XX

Page 29: Highly Scalable Java Programming for Multi-Core System

Performance of A Lock-Free Stack

Picture from: http://www.infoq.com/articles/scalable-java-components

Page 30: Highly Scalable Java Programming for Multi-Core System

References

• Amino Lib – http://amino-cbbs.sourceforge.net/

• MSDK – http://www.alphaworks.ibm.com/tech/msdk

• JLA– http://www.alphaworks.ibm.com/tech/jla

Page 31: Highly Scalable Java Programming for Multi-Core System

Backup