Debugging java deployments_2

38
© 2011 IBM Corporation Server Resiliency - Debugging Java deployments Rohit Kelapure IBM Advisory Software Engineer 29 September 2011

description

JavaOne 2011 Server production JVM Debugging talk

Transcript of Debugging java deployments_2

Page 1: Debugging java deployments_2

© 2011 IBM Corporation

Server Resiliency - Debugging Java deployments

Rohit Kelapure

IBM Advisory Software Engineer 29 September 2011

Page 2: Debugging java deployments_2

© 2011 IBM Corporation

Introduction to Speaker – Rohit Kelapure

Responsible for the resiliency of WebSphere Application Server

Team Lead and architect of Caching & Data replication features in WebSphere

Called upon to hose down fires & resolve critical situations

Customer advocate for large banks

Active blogger All Things WebSphere

Apache Open Web Beans committer

Java EE, OSGI & Spring Developer

[email protected]

[email protected]

Linkedin

http://twitter.com/#!/rkela

2

Page 3: Debugging java deployments_2

© 2011 IBM Corporation

Important Disclaimers

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES.

ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.

IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.

IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS

3

Page 4: Debugging java deployments_2

© 2011 IBM Corporation

Copyright and Trademarks

© IBM Corporation 2011. All Rights Reserved.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., and registered in many jurisdictions worldwide.

Other product and service names might be trademarks of IBM or other companies.

A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademark information” page at URL: www.ibm.com/legal/copytrade.shtml

4

Page 5: Debugging java deployments_2

© 2011 IBM Corporation5

Outline

Server Resiliency Fundamentals

Common JVM Problems

Protecting your JVM–Hung thread detection, Thread Interruption, Thread hang recovery–Memory leak detection, protection & action

Scenario based problem resolution

Tooling–Eclipse Memory Analyzer–Thread Dump Analyzer–Garbage Collection and Memory Visualizer

Page 6: Debugging java deployments_2

© 2011 IBM Corporation6

Resiliency

Property of a material that can absorb external energy when it is forced to deform elastically, and then be able to recover to its original form and release the energy

Page 7: Debugging java deployments_2

© 2011 IBM Corporation

Server Resiliency Concepts

7 April 11, 2023

1. Redundancy (Data and processing)– Create Replicas

– High cost of initialization and reconfiguration

– Redundant elements need to be synchronized from time to time

2. Partition– Splitting the data into smaller pieces and storing them

in distributed fashion

– Allows for parallelization & divide and conquer

– Partial failure isolation

3. Virtualization– Functionalities of processing and data element

virtualized as a service

– Loose coupling between system and consumed services

– Integration by enforcing explicitly boundary and schema-based interfaces

4. Decentralized Control – High communication overhead of centralized control for a

system of heavy redundancy

– Sometimes trapped in locally optimized solutions

– Fixing issues requires shutting down the entire system e.g. AWS outage

5. Explicit Messaging– Distributed Shared memory – Better

Consistency

– Message Passing – Loose Coupling

6. Uniform Interface– E.g. World Wide Web

– Better scalability, reusability and reliability

– Data, process and other forms of computations identified by one mechanism

– Semantics of operations in messages for operating on the data are unified

7. Self Management– Composed of self-managing components.

– Managed element, managers, sensors & effectors

– e.g. TCP/IP Congestion Control

Page 8: Debugging java deployments_2

© 2011 IBM Corporation

Most common JVM Problem Scenarios

Functional Problems• Unexpected Exceptions, Compatibility

OOM Errors, Memory Leaks• Memory leaks - Java Heap , Native Heap Classloaders

Hangs• Synchronized resources, GC Pause times, CPU Contention

Crash• JVM errors, JIT errors, JNI errors

High CPU• Spin loops Liveliness, Livelock

8 April 11, 2023

Page 9: Debugging java deployments_2

© 2011 IBM Corporation9 April 11, 2023

Thread Hangs

Threading and synchronization issues are among the top 5 application performance challenges

–too aggressive with shared resources causes data inconsistencies–too conservative leads to too much contention between threads

Application unresponsiveness –Adding users / threads /CPUs causes app slow down (less throughput,

worse response)–High lock acquire times & contention–Race conditions, deadlock, I/O under lock

Tooling is needed to rescue applications and the JVM from itself –Identify these conditions –If possible remedy them in the short term for server resiliency

Page 10: Debugging java deployments_2

© 2011 IBM Corporation10 April 11, 2023

JVM Hung Thread Detection

Every X seconds an alarm thread wakes up and iterates over all managed thread pools.

Subtract the "start time" of the thread from the current time, and passes it to a monitor.

Detection policy then determines based on the available data if the thread is hung

Print stack trace of the hung thread

Page 11: Debugging java deployments_2

© 2011 IBM Corporation11 April 11, 2023

Thread Interruption 101

Thread.stop stops thread by throwing ThreadDeath exception * Deprecated

Thread.interrupt(): Cooperative mechanism for a thread to signal another thread that it should, at its convenience and if it feels like it, stop what it is doing and do something else.

Interruption is usually the most sensible way to implement task cancellation. – Because each thread has its own interruption policy, you should not interrupt a

thread unless you know what interruption means to that thread.

Any method sensing interruption should– Assume current task is cancelled & perform some task‐specific cleanup– Exit as quickly and cleanly as possible ensuring that callers are aware of

cancellation • Propagate the exception, making your method an interruptible blocking

method, to throw new InterruptedException()• Restore the interruption status so that code higher up on the call stack

can deal with it Thread.currentThread().interrupt()

Only code that implements a thread's interruption policy may swallow an interruption request.

Page 12: Debugging java deployments_2

© 2011 IBM Corporation

Interrupting threads

12

Page 13: Debugging java deployments_2

© 2011 IBM Corporation13 April 11, 2023

Cancelling Threads

Page 14: Debugging java deployments_2

© 2011 IBM Corporation14 April 11, 2023

Dealing with Non‐interruptible Blocking

Many blocking library methods respond to interruption by returning early and throwing InterruptedException

Makes it easier to build tasks that are responsive to cancellation – Lock.lockInterruptibly– Thread.sleep,– Thread.wait– Thread.notify– Thread.join

Not all blocking methods or blocking mechanisms are responsive to interruption

if a thread is blocked performing synchronous socket I/O, interruption has no effect other than setting the thread's interrupted status

If a thread is blocked waiting for an intrinsic lock, there is nothing you can do to stop short of ensuring that it eventually acquires the lock

Page 15: Debugging java deployments_2

© 2011 IBM Corporation

Thread Hang Recovery – Technique

Application specific hacks for thread hang recovery

Byte code instrumentation

Transform the concrete subclasses of the abstract classes InputStream & OutputStream to make the socket I/O operations interruptible.

Transform an application class so that every loop can be interrupted by invoking Interrupter.interrupt(Thread, boolean)

Transform a monitorenter instruction and a monitorexit instruction so that the wait at entering into a monitor is interruptible

http://www.ibm.com/developerworks/websphere/downloads/hungthread.html

15

Page 16: Debugging java deployments_2

© 2011 IBM Corporation16 April 11, 2023

Memory Leaks

Leaks come in various types, such as– Memory leaks– Thread and ThreadLocal leaks– ClassLoader leaks– System resource leaks– Connection leaks

Customers want to increase application uptime without cycling the server. – Frequent application restarts without stopping the server.

Frequent redeployments of the application result in OOM errors

What do we have today – Offline post-mortem analysis of a JVM heap. Tools like Jrockit Mission Control,

MAT. IEMA are the IBM Extensions for Memory Analyzer– Runtime memory leak detection using JVMTI and PMI (Runtime Performance

Advisor)

We don’t have application level i.e. top down memory leak detection and protection

– Leak detection by looking at suspect patterns in application code

Page 17: Debugging java deployments_2

© 2011 IBM Corporation

ClassLoader Leaks 101

A class is uniquely identified by– Its name + The class loader that loaded it

Class with the same name can be loaded multiple times in a single JVM, each in a different class loader

– Web containers use this for isolating web applications

– Each web application gets its own class loader

Reference Chain– An object retains a reference to the class it is

an instance of– A class retains a reference to the class loader

that loaded it– The class loader retains a reference to every

class it loaded

Retaining a reference to a single object from a web application pins every class loaded by the web application

These references often remain after a web application reload With each reload, more classes get pinned ultimately leading to an OOM17 April 11, 2023

Page 18: Debugging java deployments_2

© 2011 IBM Corporation18 April 11, 2023

Tomcat pioneered approach - Leak Prevention

JRE triggered leak – Singleton / static initializer

• Can be a Thread• Something that won’t get garbage collected

– Retains a reference to the context class loader when loaded– If web application code triggers the initialization

• The context class loader will be web application class loader• A reference is created to the web application class loader• This reference is never garbage collected• Pins the class loader (and hence all the classes it loaded) in memory

Prevention with a DeployedObjectListener– Calling various parts of the Java API that are known to retain a reference to

the current context class loader– Initialize these singletons when the Application Server’s class loader is the

context class loader

Page 19: Debugging java deployments_2

© 2011 IBM Corporation19 April 11, 2023

Leak Detection

Application Triggered Leaks– ClassLoader – Threads– ThreadLocal – JDBC Drivers– Non Application

• RMI Targets• Resource Bundle• Static final references• InstrospectionUtils• Loggers

Prevention– Code executes when a web application is

stopped, un-deployed or reloaded– Check, via a combination of standard API

calls and some reflection tricks, for known causes of memory leaks

Page 20: Debugging java deployments_2

© 2011 IBM Corporation

Memory leak detection console

20 April 11, 2023

Page 21: Debugging java deployments_2

© 2011 IBM Corporation

What is wrong with my application …?

Why does my application run slow every time I do X ?

Why does my application have erratic response times ?

Why am I getting Out of Memory Errors ?

What is my applications memory footprint ?

Which parts of my application are CPU intensive ?

How did my JVM vanish without a trace ?

Why is my application unresponsive ?

What monitoring do I put in place for my app. ?

21 April 11, 2023

Page 22: Debugging java deployments_2

© 2011 IBM Corporation22 April 11, 2023

What is your JVM up to ? Windows style task manager for displaying thread status and allow for their recovery & interruption

Leverage the ThreadMXBean API in the JDK to display thread information– https://github.com/kelapure/dynacache/blob/master/scripts/AllThreads.jsp

https://github.com/kelapure/dynacache/blob/master/scripts/ViewThread.jsp

Page 23: Debugging java deployments_2

© 2011 IBM Corporation

Application runs slow when I do XXX ?

Understand impact of activity on components– Look at the thread & method profiles

• IBM Java Health Center • Visual VM• Jrockit Mission Control

JVM method & dump trace - pinpoint performance problems. – Shows entry & exit times of any Java method

• Method to trace to file for all methods in tests.mytest.package– Allows taking javadump, heapdump, etc when a method is hit

• Dump javacore when method testInnerMethod in an inner class TestInnerClass of a class TestClass is called

– Use Btrace, -Xtrace * –Xdump to trigger dumps on a range of events• gpf, user, abort, fullgc, slow, allocation, thrstop, throw …• Stack traces, tool launching

23 April 11, 2023

Page 24: Debugging java deployments_2

© 2011 IBM Corporation

Application has erratic response times ?

Verbose gc should be enabled by default– <2% impact on performance

VisualGC, GCMV &PMAT : Visualize GC output – In use space after GC

• Positive gradient over time indicates memory leak

• Increased load (use for capacity plan) • Memory leak (take HDs for PD.)

Choose the right GC policy – Optimized for “batch” type applications,

consistent allocation profile– Tight responsiveness criteria, allocations of

large objects– High rates of object “burn”, large # of

transitional objects– 12, 16 core SMP systems with allocation

contention (AIX only)

GC overhead > 10% wrong policy | more tuning

Enable compressed references for 64 bit JVM 24 April 11, 2023

Page 25: Debugging java deployments_2

© 2011 IBM Corporation

Out Of Memory Errors ?

JVM Heap sized incorrectly– GC adapts heap size to keep occupancy [40, 70]%

Determine heap occupancy of the app. under load– Xmx = 43% larger than max. occupancy of app.

• For 700MB occupancy , 1000MB Max. heap is reqd. (700 +43% of 700)

Analyze heapdumps & system dumps with tools like Eclipse Memory Analyzer– Lack of Java heap or Native heap– Eclipse Memory Analyzer and IBM extensions

Finding which methods allocated large objects– Prints stacktrace for all objects above 1K

Enable Java Heap and Native heap monitoring – JMX and metrics output by JVM

Classloader exhaustion

25 April 11, 2023

Page 26: Debugging java deployments_2

© 2011 IBM Corporation

Applications memory footprint ?

HPROF – profiler shipped with JDK – uses JVMTI – Analysis of memory usage -Xrunhprof:heap=all

Performance Inspector tools - JPROF Java Profiling Agent– Capture state of the Java Heap later processed by HDUMP

Group a system dump by classloader – since each app has its own classloader, you can get accurate information on

how much heap each application is taking up

Use MAT to investigate heapdumps & system dumps – Find large clumps, Inspect those objects, What retains them ?

• Why is this object not being garbage collected – • List Objects > incoming refs, Path to GC roots, Immediate dominators • Limit analysis to a single application in a JEE environment - Dominator tree

grouped by ClassLoader Dominator tree grouped by Class Loader• Set of objects that can be reclaimed if we could delete X - Retained Size

Graphs Retained Size Graphs • Traditional memory hogs like HTTPSession, Cache - Use Object Query

Language (OQL

Use Object Query Language (OQL)

26 April 11, 2023

Page 27: Debugging java deployments_2

© 2011 IBM Corporation

Using Javacores for Troubleshooting

Javacores are often the most critical piece of information to resolve a hang, high CPU, crash and sometimes memory problems

A Javacore is a text file that contains a lot of useful information– The date, time, java™ version, full command path and arguments– All the threads in the JVM, including thread state, priority, thread ID, name– Thread call stacks

Javacores can be generated automatically or on demand– Automatically when an OutOfMemoryException is thrown– On demand with “kill -3 <pid>”

Message to the SystemOut when a javacore is generated

27

"WebContainer : 537" (TID:0x088C7200, sys_thread_t:0x09C19F00, state:CW, native ID:0x000070E8) prio=5 at java/net/SocketInputStream.socketRead0(Native Method) at java/net/SocketInputStream.read(SocketInputStream.java:155) at oracle/net/ns/Packet.receive(Bytecode PC:31) at oracle/net/ns/DataPacket.receive(Bytecode PC:1) at oracle/net/ns/NetInputStream.read(Bytecode PC:33) at oracle/jdbc/driver/T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1123) at oracle/jdbc/driver/T4C8Oall.receive(T4C8Oall.java:480) at oracle/jdbc/driver/T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:813) at oracle/jdbc/driver/OracleStatement.doExecuteWithTimeout(OracleStatement.java:1154) at oracle/jdbc/driver/OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3415) at com/ibm/commerce/user/objects/EJSJDBCPersisterCMPDemographicsBean_2bcaa7a2.load() at com/ibm/ejs/container/ContainerManagedBeanO.load(ContainerManagedBeanO.java:1018) at com/ibm/ejs/container/EJSHome.activateBean(EJSHome.java:1718)

Page 28: Debugging java deployments_2

© 2011 IBM Corporation

CPU intensive parts of the app?

ThreadDumps or Javacores - Poor mans profiler– Periodic javacores – Thread analysis – using the Thread Monitor Dump Analyzer tool

High CPU is typically diagnosed by comparing two key pieces of information– Using Javacores, determine what code the threads are executing– Gather CPU usage statistics by thread

For each Javacore compare the call stacks between threads– Focus first on Request processing threads first– Are all the threads doing similar work?

Are the threads moving ?

Collect CPU statistics per thread

Is there one thread consuming most of the CPU?

Are there many active threads each consuming a small percentage of CPU?

High CPU due to excessive garbage collection ?

If this is a load/capacity problem then use HPROF profiler – -Xrunhrof:cpu=samples, -Xrunhprof:cpu=time

28 April 11, 2023

Page 29: Debugging java deployments_2

© 2011 IBM Corporation

Diagnosis - Hangs

Often hangs are due to unresponsive synchronous requests– SMTP Server, Database, Map Service, Store Locator, Inventory, Order processing, etc

3XMTHREADINFO "Servlet.Engine.Transports : 11" (TID:0x7DD38040, sys_thread_t:0x44618828, state:R, native ID:0x4A9F) prio=54XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.SQLExecute()4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.execute2(DB2PreparedStatement.java)4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.executeQuery(DB2PreparedStatement.java()4XESTACKTRACE at ...3XMTHREADINFO "Servlet.Engine.Transports : 12" (TID:0x7DD37FC0, sys_thread_t:0x4461BDA8, state:R, native ID:0x4BA0) prio=54XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.SQLExecute()4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.execute2(DB2PreparedStatement.java)4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.executeQuery(DB2PreparedStatement.java()4XESTACKTRACE at ...3XMTHREADINFO "Servlet.Engine.Transports : 13" (TID:0x7DD34C50, sys_thread_t:0x4465B028, state:R, native ID:0x4CCF) prio=54XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.SQLExecute()4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.execute2(DB2PreparedStatement.java)4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.executeQuery(DB2PreparedStatement.java()

Not all hangs are waiting on an external resource– A JVM can hang due to a synchronization problem - One thread blocking

several others

29

3XMTHREADINFO "Servlet.Engine.Transports : 11" (TID:0x7DD38040, sys_thread_t:0x44618828, state:R, native ID:0x4A9F) prio=53LKMONOBJECT com/ibm/ws/cache/Cache@0x65FB8788/0x65FB8794: owner "Default : DMN0" (0x355B48003LKWAITERQ Waiting to enter:3LKWAITER "WebContainer : 0" (0x3ACCD000)3LKWAITER "WebContainer : 1" (0x3ACCCB00)3LKWAITER "WebContainer : 2" (0x38D68300)3LKWAITER "WebContainer : 3" (0x38D68800)

Page 30: Debugging java deployments_2

© 2011 IBM Corporation

How did my JVM vanish without trace ?

JVM Process Crash Usual Suspects– Bad JNI calls, Segmentation violations, Call Stack Overflow– Native memory leaks - Object allocation fails with sufficient space in the JVM

heap– Unexpected OS exceptions (out of disk space, file handles), JIT failures

Monitor the OS process size

Runtime check of JVM memory allocations –– Xcheck:memory

Native memory usage - Create a core dump on an OOM

JNI code static analysis -Xcheck:jni (errors, warnings, advice)

GCMV provides scripts and graphing for native memory– Windows “perfmon“, Linux “ps” & AIX “svmon”

Find the last stack of native code executing on the thread during the crash The signal info (1TISIGINFO) will show the Javacore was created due to a crash

– Signal 11 (SIGSEGV) or GPF

30 April 11, 2023

Page 31: Debugging java deployments_2

© 2011 IBM Corporation

What do I monitor ?

31 April 11, 2023

Page 32: Debugging java deployments_2

© 2011 IBM Corporation

Top Malpractices no

Arch. planNo

migration plan

No change records

No Capacity plan

No Production traffic profile

Changes put directly in Prod.

No load & Stress testing

Communication breakdown

No education

Application Error

Test environment != Production

32 April 11, 2023

Page 33: Debugging java deployments_2

© 2011 IBM Corporation

Support Assistant Workbench to help with Problem Determination

33 April 11, 2023

Page 34: Debugging java deployments_2

© 2011 IBM Corporation

One stop shop for tools to analyze JVM issues

34 April 11, 2023

Page 35: Debugging java deployments_2

© 2011 IBM Corporation

ToolsProblem Artifact Monitoring & Analysis

Memory leaks Out of Memory errors Application Unresponsive

Verbose Garbage collection log (native_stdout.log)

• GCMV• VisualGC• jps, jstat, jstatd, jinfo

High CPU, Crash, Hang, Performance bottleneck, Unexpected termination

Javadump, Javacore (javacore*.txt)

• Thread Monitor & Dump Analyzer (TMDA)

Lock ContentionLow CPU at high load

Threads (Connection to running JVM)

• Sun VisualVM• JConsole• IBM Health Center• Jrockit Mission Control

Memory LeakOut of Memory errors

Heapdump (*.phd, *.txt, *.hprof)

• MAT• HeapAnalyzer• JHat

Native Memory LeakAnomaliesUnexpected Crash

System or core dump (core.dmp, user.dmp), Files must be processed with jextract tool

Monitor - GCMV, Examine - pmap & VMMap, Track - DebugDiag, libumem, valgrind, cmalloc & NJAMD

35 April 11, 2023

Page 36: Debugging java deployments_2

© 2011 IBM Corporation

Runtime Serviceability aids

Troubleshooting panels in the administration console

Performance Monitoring Infrastructure metrics

Diagnostic Provider Mbeans– Dump Configuration, State and run self-test

Application Response Measurement/Request Metrics – Follow transaction end-to-end and find bottlenecks

Trace logs & First Failure Data Capture

Runtime Performance Advisors– Memory leak detection, session size, …

Specialized tracing and Runtime checks– Tomcat Classloader Leak Detection– Session crossover, Connection leak, ByteBuffer leak detection – Runaway CPU thread protection

36 April 11, 2023

Page 37: Debugging java deployments_2

© 2011 IBM Corporation

References

Java theory and practice: Dealing with InterruptedException– http://www.ibm.com/developerworks/java/library/j-jtp05236/index.html

Architectural design for resilience– http://dx.doi.org/10.1080/17517570903067751

IBM Support Assistant– http://www-01.ibm.com/software/support/isa/download.html

How Customers get into trouble– http://www-01.ibm.com/support/docview.wss?uid=swg27008359

37

Page 38: Debugging java deployments_2

© 2011 IBM Corporation

Q&A

Thank You

38 April 11, 2023