Debugging java deployments_2
-
Upload
rohit-kelapure -
Category
Technology
-
view
3.158 -
download
3
description
Transcript of Debugging java deployments_2
© 2011 IBM Corporation
Server Resiliency - Debugging Java deployments
Rohit Kelapure
IBM Advisory Software Engineer 29 September 2011
© 2011 IBM Corporation
Introduction to Speaker – Rohit Kelapure
Responsible for the resiliency of WebSphere Application Server
Team Lead and architect of Caching & Data replication features in WebSphere
Called upon to hose down fires & resolve critical situations
Customer advocate for large banks
Active blogger All Things WebSphere
Apache Open Web Beans committer
Java EE, OSGI & Spring Developer
http://twitter.com/#!/rkela
2
© 2011 IBM Corporation
Important Disclaimers
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.
ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES.
ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.
IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.
IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS
3
© 2011 IBM Corporation
Copyright and Trademarks
© IBM Corporation 2011. All Rights Reserved.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., and registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies.
A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademark information” page at URL: www.ibm.com/legal/copytrade.shtml
4
© 2011 IBM Corporation5
Outline
Server Resiliency Fundamentals
Common JVM Problems
Protecting your JVM–Hung thread detection, Thread Interruption, Thread hang recovery–Memory leak detection, protection & action
Scenario based problem resolution
Tooling–Eclipse Memory Analyzer–Thread Dump Analyzer–Garbage Collection and Memory Visualizer
© 2011 IBM Corporation6
Resiliency
Property of a material that can absorb external energy when it is forced to deform elastically, and then be able to recover to its original form and release the energy
© 2011 IBM Corporation
Server Resiliency Concepts
7 April 11, 2023
1. Redundancy (Data and processing)– Create Replicas
– High cost of initialization and reconfiguration
– Redundant elements need to be synchronized from time to time
2. Partition– Splitting the data into smaller pieces and storing them
in distributed fashion
– Allows for parallelization & divide and conquer
– Partial failure isolation
3. Virtualization– Functionalities of processing and data element
virtualized as a service
– Loose coupling between system and consumed services
– Integration by enforcing explicitly boundary and schema-based interfaces
4. Decentralized Control – High communication overhead of centralized control for a
system of heavy redundancy
– Sometimes trapped in locally optimized solutions
– Fixing issues requires shutting down the entire system e.g. AWS outage
5. Explicit Messaging– Distributed Shared memory – Better
Consistency
– Message Passing – Loose Coupling
6. Uniform Interface– E.g. World Wide Web
– Better scalability, reusability and reliability
– Data, process and other forms of computations identified by one mechanism
– Semantics of operations in messages for operating on the data are unified
7. Self Management– Composed of self-managing components.
– Managed element, managers, sensors & effectors
– e.g. TCP/IP Congestion Control
© 2011 IBM Corporation
Most common JVM Problem Scenarios
Functional Problems• Unexpected Exceptions, Compatibility
OOM Errors, Memory Leaks• Memory leaks - Java Heap , Native Heap Classloaders
Hangs• Synchronized resources, GC Pause times, CPU Contention
Crash• JVM errors, JIT errors, JNI errors
High CPU• Spin loops Liveliness, Livelock
8 April 11, 2023
© 2011 IBM Corporation9 April 11, 2023
Thread Hangs
Threading and synchronization issues are among the top 5 application performance challenges
–too aggressive with shared resources causes data inconsistencies–too conservative leads to too much contention between threads
Application unresponsiveness –Adding users / threads /CPUs causes app slow down (less throughput,
worse response)–High lock acquire times & contention–Race conditions, deadlock, I/O under lock
Tooling is needed to rescue applications and the JVM from itself –Identify these conditions –If possible remedy them in the short term for server resiliency
© 2011 IBM Corporation10 April 11, 2023
JVM Hung Thread Detection
Every X seconds an alarm thread wakes up and iterates over all managed thread pools.
Subtract the "start time" of the thread from the current time, and passes it to a monitor.
Detection policy then determines based on the available data if the thread is hung
Print stack trace of the hung thread
© 2011 IBM Corporation11 April 11, 2023
Thread Interruption 101
Thread.stop stops thread by throwing ThreadDeath exception * Deprecated
Thread.interrupt(): Cooperative mechanism for a thread to signal another thread that it should, at its convenience and if it feels like it, stop what it is doing and do something else.
Interruption is usually the most sensible way to implement task cancellation. – Because each thread has its own interruption policy, you should not interrupt a
thread unless you know what interruption means to that thread.
Any method sensing interruption should– Assume current task is cancelled & perform some task‐specific cleanup– Exit as quickly and cleanly as possible ensuring that callers are aware of
cancellation • Propagate the exception, making your method an interruptible blocking
method, to throw new InterruptedException()• Restore the interruption status so that code higher up on the call stack
can deal with it Thread.currentThread().interrupt()
Only code that implements a thread's interruption policy may swallow an interruption request.
© 2011 IBM Corporation
Interrupting threads
12
© 2011 IBM Corporation13 April 11, 2023
Cancelling Threads
© 2011 IBM Corporation14 April 11, 2023
Dealing with Non‐interruptible Blocking
Many blocking library methods respond to interruption by returning early and throwing InterruptedException
Makes it easier to build tasks that are responsive to cancellation – Lock.lockInterruptibly– Thread.sleep,– Thread.wait– Thread.notify– Thread.join
Not all blocking methods or blocking mechanisms are responsive to interruption
if a thread is blocked performing synchronous socket I/O, interruption has no effect other than setting the thread's interrupted status
If a thread is blocked waiting for an intrinsic lock, there is nothing you can do to stop short of ensuring that it eventually acquires the lock
© 2011 IBM Corporation
Thread Hang Recovery – Technique
Application specific hacks for thread hang recovery
Byte code instrumentation
Transform the concrete subclasses of the abstract classes InputStream & OutputStream to make the socket I/O operations interruptible.
Transform an application class so that every loop can be interrupted by invoking Interrupter.interrupt(Thread, boolean)
Transform a monitorenter instruction and a monitorexit instruction so that the wait at entering into a monitor is interruptible
http://www.ibm.com/developerworks/websphere/downloads/hungthread.html
15
© 2011 IBM Corporation16 April 11, 2023
Memory Leaks
Leaks come in various types, such as– Memory leaks– Thread and ThreadLocal leaks– ClassLoader leaks– System resource leaks– Connection leaks
Customers want to increase application uptime without cycling the server. – Frequent application restarts without stopping the server.
Frequent redeployments of the application result in OOM errors
What do we have today – Offline post-mortem analysis of a JVM heap. Tools like Jrockit Mission Control,
MAT. IEMA are the IBM Extensions for Memory Analyzer– Runtime memory leak detection using JVMTI and PMI (Runtime Performance
Advisor)
We don’t have application level i.e. top down memory leak detection and protection
– Leak detection by looking at suspect patterns in application code
© 2011 IBM Corporation
ClassLoader Leaks 101
A class is uniquely identified by– Its name + The class loader that loaded it
Class with the same name can be loaded multiple times in a single JVM, each in a different class loader
– Web containers use this for isolating web applications
– Each web application gets its own class loader
Reference Chain– An object retains a reference to the class it is
an instance of– A class retains a reference to the class loader
that loaded it– The class loader retains a reference to every
class it loaded
Retaining a reference to a single object from a web application pins every class loaded by the web application
These references often remain after a web application reload With each reload, more classes get pinned ultimately leading to an OOM17 April 11, 2023
© 2011 IBM Corporation18 April 11, 2023
Tomcat pioneered approach - Leak Prevention
JRE triggered leak – Singleton / static initializer
• Can be a Thread• Something that won’t get garbage collected
– Retains a reference to the context class loader when loaded– If web application code triggers the initialization
• The context class loader will be web application class loader• A reference is created to the web application class loader• This reference is never garbage collected• Pins the class loader (and hence all the classes it loaded) in memory
Prevention with a DeployedObjectListener– Calling various parts of the Java API that are known to retain a reference to
the current context class loader– Initialize these singletons when the Application Server’s class loader is the
context class loader
© 2011 IBM Corporation19 April 11, 2023
Leak Detection
Application Triggered Leaks– ClassLoader – Threads– ThreadLocal – JDBC Drivers– Non Application
• RMI Targets• Resource Bundle• Static final references• InstrospectionUtils• Loggers
Prevention– Code executes when a web application is
stopped, un-deployed or reloaded– Check, via a combination of standard API
calls and some reflection tricks, for known causes of memory leaks
© 2011 IBM Corporation
Memory leak detection console
20 April 11, 2023
© 2011 IBM Corporation
What is wrong with my application …?
Why does my application run slow every time I do X ?
Why does my application have erratic response times ?
Why am I getting Out of Memory Errors ?
What is my applications memory footprint ?
Which parts of my application are CPU intensive ?
How did my JVM vanish without a trace ?
Why is my application unresponsive ?
What monitoring do I put in place for my app. ?
21 April 11, 2023
© 2011 IBM Corporation22 April 11, 2023
What is your JVM up to ? Windows style task manager for displaying thread status and allow for their recovery & interruption
Leverage the ThreadMXBean API in the JDK to display thread information– https://github.com/kelapure/dynacache/blob/master/scripts/AllThreads.jsp
https://github.com/kelapure/dynacache/blob/master/scripts/ViewThread.jsp
© 2011 IBM Corporation
Application runs slow when I do XXX ?
Understand impact of activity on components– Look at the thread & method profiles
• IBM Java Health Center • Visual VM• Jrockit Mission Control
JVM method & dump trace - pinpoint performance problems. – Shows entry & exit times of any Java method
• Method to trace to file for all methods in tests.mytest.package– Allows taking javadump, heapdump, etc when a method is hit
• Dump javacore when method testInnerMethod in an inner class TestInnerClass of a class TestClass is called
– Use Btrace, -Xtrace * –Xdump to trigger dumps on a range of events• gpf, user, abort, fullgc, slow, allocation, thrstop, throw …• Stack traces, tool launching
23 April 11, 2023
© 2011 IBM Corporation
Application has erratic response times ?
Verbose gc should be enabled by default– <2% impact on performance
VisualGC, GCMV &PMAT : Visualize GC output – In use space after GC
• Positive gradient over time indicates memory leak
• Increased load (use for capacity plan) • Memory leak (take HDs for PD.)
Choose the right GC policy – Optimized for “batch” type applications,
consistent allocation profile– Tight responsiveness criteria, allocations of
large objects– High rates of object “burn”, large # of
transitional objects– 12, 16 core SMP systems with allocation
contention (AIX only)
GC overhead > 10% wrong policy | more tuning
Enable compressed references for 64 bit JVM 24 April 11, 2023
© 2011 IBM Corporation
Out Of Memory Errors ?
JVM Heap sized incorrectly– GC adapts heap size to keep occupancy [40, 70]%
Determine heap occupancy of the app. under load– Xmx = 43% larger than max. occupancy of app.
• For 700MB occupancy , 1000MB Max. heap is reqd. (700 +43% of 700)
Analyze heapdumps & system dumps with tools like Eclipse Memory Analyzer– Lack of Java heap or Native heap– Eclipse Memory Analyzer and IBM extensions
Finding which methods allocated large objects– Prints stacktrace for all objects above 1K
Enable Java Heap and Native heap monitoring – JMX and metrics output by JVM
Classloader exhaustion
25 April 11, 2023
© 2011 IBM Corporation
Applications memory footprint ?
HPROF – profiler shipped with JDK – uses JVMTI – Analysis of memory usage -Xrunhprof:heap=all
Performance Inspector tools - JPROF Java Profiling Agent– Capture state of the Java Heap later processed by HDUMP
Group a system dump by classloader – since each app has its own classloader, you can get accurate information on
how much heap each application is taking up
Use MAT to investigate heapdumps & system dumps – Find large clumps, Inspect those objects, What retains them ?
• Why is this object not being garbage collected – • List Objects > incoming refs, Path to GC roots, Immediate dominators • Limit analysis to a single application in a JEE environment - Dominator tree
grouped by ClassLoader Dominator tree grouped by Class Loader• Set of objects that can be reclaimed if we could delete X - Retained Size
Graphs Retained Size Graphs • Traditional memory hogs like HTTPSession, Cache - Use Object Query
Language (OQL
Use Object Query Language (OQL)
26 April 11, 2023
© 2011 IBM Corporation
Using Javacores for Troubleshooting
Javacores are often the most critical piece of information to resolve a hang, high CPU, crash and sometimes memory problems
A Javacore is a text file that contains a lot of useful information– The date, time, java™ version, full command path and arguments– All the threads in the JVM, including thread state, priority, thread ID, name– Thread call stacks
Javacores can be generated automatically or on demand– Automatically when an OutOfMemoryException is thrown– On demand with “kill -3 <pid>”
Message to the SystemOut when a javacore is generated
27
"WebContainer : 537" (TID:0x088C7200, sys_thread_t:0x09C19F00, state:CW, native ID:0x000070E8) prio=5 at java/net/SocketInputStream.socketRead0(Native Method) at java/net/SocketInputStream.read(SocketInputStream.java:155) at oracle/net/ns/Packet.receive(Bytecode PC:31) at oracle/net/ns/DataPacket.receive(Bytecode PC:1) at oracle/net/ns/NetInputStream.read(Bytecode PC:33) at oracle/jdbc/driver/T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1123) at oracle/jdbc/driver/T4C8Oall.receive(T4C8Oall.java:480) at oracle/jdbc/driver/T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:813) at oracle/jdbc/driver/OracleStatement.doExecuteWithTimeout(OracleStatement.java:1154) at oracle/jdbc/driver/OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3415) at com/ibm/commerce/user/objects/EJSJDBCPersisterCMPDemographicsBean_2bcaa7a2.load() at com/ibm/ejs/container/ContainerManagedBeanO.load(ContainerManagedBeanO.java:1018) at com/ibm/ejs/container/EJSHome.activateBean(EJSHome.java:1718)
© 2011 IBM Corporation
CPU intensive parts of the app?
ThreadDumps or Javacores - Poor mans profiler– Periodic javacores – Thread analysis – using the Thread Monitor Dump Analyzer tool
High CPU is typically diagnosed by comparing two key pieces of information– Using Javacores, determine what code the threads are executing– Gather CPU usage statistics by thread
For each Javacore compare the call stacks between threads– Focus first on Request processing threads first– Are all the threads doing similar work?
Are the threads moving ?
Collect CPU statistics per thread
Is there one thread consuming most of the CPU?
Are there many active threads each consuming a small percentage of CPU?
High CPU due to excessive garbage collection ?
If this is a load/capacity problem then use HPROF profiler – -Xrunhrof:cpu=samples, -Xrunhprof:cpu=time
28 April 11, 2023
© 2011 IBM Corporation
Diagnosis - Hangs
Often hangs are due to unresponsive synchronous requests– SMTP Server, Database, Map Service, Store Locator, Inventory, Order processing, etc
3XMTHREADINFO "Servlet.Engine.Transports : 11" (TID:0x7DD38040, sys_thread_t:0x44618828, state:R, native ID:0x4A9F) prio=54XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.SQLExecute()4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.execute2(DB2PreparedStatement.java)4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.executeQuery(DB2PreparedStatement.java()4XESTACKTRACE at ...3XMTHREADINFO "Servlet.Engine.Transports : 12" (TID:0x7DD37FC0, sys_thread_t:0x4461BDA8, state:R, native ID:0x4BA0) prio=54XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.SQLExecute()4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.execute2(DB2PreparedStatement.java)4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.executeQuery(DB2PreparedStatement.java()4XESTACKTRACE at ...3XMTHREADINFO "Servlet.Engine.Transports : 13" (TID:0x7DD34C50, sys_thread_t:0x4465B028, state:R, native ID:0x4CCF) prio=54XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.SQLExecute()4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.execute2(DB2PreparedStatement.java)4XESTACKTRACE at COM.ibm.db2.jdbc.app.DB2PreparedStatement.executeQuery(DB2PreparedStatement.java()
Not all hangs are waiting on an external resource– A JVM can hang due to a synchronization problem - One thread blocking
several others
29
3XMTHREADINFO "Servlet.Engine.Transports : 11" (TID:0x7DD38040, sys_thread_t:0x44618828, state:R, native ID:0x4A9F) prio=53LKMONOBJECT com/ibm/ws/cache/Cache@0x65FB8788/0x65FB8794: owner "Default : DMN0" (0x355B48003LKWAITERQ Waiting to enter:3LKWAITER "WebContainer : 0" (0x3ACCD000)3LKWAITER "WebContainer : 1" (0x3ACCCB00)3LKWAITER "WebContainer : 2" (0x38D68300)3LKWAITER "WebContainer : 3" (0x38D68800)
© 2011 IBM Corporation
How did my JVM vanish without trace ?
JVM Process Crash Usual Suspects– Bad JNI calls, Segmentation violations, Call Stack Overflow– Native memory leaks - Object allocation fails with sufficient space in the JVM
heap– Unexpected OS exceptions (out of disk space, file handles), JIT failures
Monitor the OS process size
Runtime check of JVM memory allocations –– Xcheck:memory
Native memory usage - Create a core dump on an OOM
JNI code static analysis -Xcheck:jni (errors, warnings, advice)
GCMV provides scripts and graphing for native memory– Windows “perfmon“, Linux “ps” & AIX “svmon”
Find the last stack of native code executing on the thread during the crash The signal info (1TISIGINFO) will show the Javacore was created due to a crash
– Signal 11 (SIGSEGV) or GPF
30 April 11, 2023
© 2011 IBM Corporation
What do I monitor ?
31 April 11, 2023
© 2011 IBM Corporation
Top Malpractices no
Arch. planNo
migration plan
No change records
No Capacity plan
No Production traffic profile
Changes put directly in Prod.
No load & Stress testing
Communication breakdown
No education
Application Error
Test environment != Production
32 April 11, 2023
© 2011 IBM Corporation
Support Assistant Workbench to help with Problem Determination
33 April 11, 2023
© 2011 IBM Corporation
One stop shop for tools to analyze JVM issues
34 April 11, 2023
© 2011 IBM Corporation
ToolsProblem Artifact Monitoring & Analysis
Memory leaks Out of Memory errors Application Unresponsive
Verbose Garbage collection log (native_stdout.log)
• GCMV• VisualGC• jps, jstat, jstatd, jinfo
High CPU, Crash, Hang, Performance bottleneck, Unexpected termination
Javadump, Javacore (javacore*.txt)
• Thread Monitor & Dump Analyzer (TMDA)
Lock ContentionLow CPU at high load
Threads (Connection to running JVM)
• Sun VisualVM• JConsole• IBM Health Center• Jrockit Mission Control
Memory LeakOut of Memory errors
Heapdump (*.phd, *.txt, *.hprof)
• MAT• HeapAnalyzer• JHat
Native Memory LeakAnomaliesUnexpected Crash
System or core dump (core.dmp, user.dmp), Files must be processed with jextract tool
Monitor - GCMV, Examine - pmap & VMMap, Track - DebugDiag, libumem, valgrind, cmalloc & NJAMD
35 April 11, 2023
© 2011 IBM Corporation
Runtime Serviceability aids
Troubleshooting panels in the administration console
Performance Monitoring Infrastructure metrics
Diagnostic Provider Mbeans– Dump Configuration, State and run self-test
Application Response Measurement/Request Metrics – Follow transaction end-to-end and find bottlenecks
Trace logs & First Failure Data Capture
Runtime Performance Advisors– Memory leak detection, session size, …
Specialized tracing and Runtime checks– Tomcat Classloader Leak Detection– Session crossover, Connection leak, ByteBuffer leak detection – Runaway CPU thread protection
36 April 11, 2023
© 2011 IBM Corporation
References
Java theory and practice: Dealing with InterruptedException– http://www.ibm.com/developerworks/java/library/j-jtp05236/index.html
Architectural design for resilience– http://dx.doi.org/10.1080/17517570903067751
IBM Support Assistant– http://www-01.ibm.com/software/support/isa/download.html
How Customers get into trouble– http://www-01.ibm.com/support/docview.wss?uid=swg27008359
37
© 2011 IBM Corporation
Q&A
Thank You
38 April 11, 2023