UPortal Performance & Memory Issues Scott Battaglia [email protected] Rutgers, the State...
-
Upload
milton-blake -
Category
Documents
-
view
219 -
download
4
Transcript of UPortal Performance & Memory Issues Scott Battaglia [email protected] Rutgers, the State...
uPortal Performance & Memory Issues
Scott [email protected]
Rutgers, the State University of New Jersey
Description of Problem Amount of memory consumed by
uPortal grows consistently Continues to consume memory
until there is no memory left Application stops working properly
and hangs Consistent with definition of a
memory leak
Background
Launched myRutgers on uPortal 2.3
Issue was not seen in our QA Seeing issue in production since
November 2004
Background
Also seen in production by: Yale University University of Louisiana at Lafayette University of California at Irvine Cornell University
Temporary Workaround
Monitor memory usage of uPortal When memory drops below 5%
bounce JVM.
Issues with Workaround
May be too aggressive In some cases, JVM may be able to
garbage collect Causes users on that JVM to lose
their session If miss window of opportunity to
restart, can take down Apache also
Issues with Workaround
Ultimately, does nothing to resolve memory issue.
Just makes it barely livable
History of Fixes Removed caching of IPersons from
PersonDirectory CError and CSecureInfo now pass events to
wrapped channels. Restrict access to ChannelFactory’s channel
cache, synchronized instantiateChannel method.
Guest sessions created on time out AbstractMultithreadedChannels were not
cleaning out their channel state maps (2 of them).
But….
3 Months later, issue still exists. Previous steps solved memory
leaks but still more exist. The search continues…
What’s Happening Today Renewed effort to search for
memory leaks Initial Steps taken:
Retooling of Load Tests Production Snapshots Incremental Updates Re-affirming that loadtest system
matches production system
Retooling of Load Tests
Attempt to mimic more closely what a user does in production. More custom layouts Less people logging out Hitting more popular channels more
aggressively
Retooling of Load Tests
Attempt to accomplish same throughput Determine average user session
length Determine rate at which users access
system
Retooling of Load Tests
Bought test system with same specs/setup as production systems
Ensure database optimizations are the same
Ensure uPortal configuration is the same (i.e. StatsRecorder)
Production Snapshots
Only seeing issue in production Need to capture production
snapshots JVM Heap Size initially set at 2 GB
Production Snapshots
Lowered JVM Heap Size to 128 MB on machine Allows us to compare snapshots
When memory reaches 10% take it out of load balancing rotation
Garbage Collect
Production Snapshots
Capture snapshot Wait past session timeout
Currently set at 15 minutes Garbage Collect again Take new snapshot Analyze Snapshot
Production Snapshots
What do they tell us? They help us determine what objects
are still in memory Tells us how much memory they are
using Tells us how much memory items
they reference are using
Understanding the Snapshots
Use YourKit Java Profiler to capture memory snapshots
YourKit consists of two parts: Component that runs on server Local application to open memory
snapshots
Understanding the Snapshots YourKit tells us:
Reports incoming and outgoing references
Totals for objects of each type How much memory they consume Allows us to compare snapshots,
showing the deltas of each object type. uPortal community has about 20
licenses for YourKit
Understanding the Snapshots Name Objects Shallow Size Retained Size
Understanding the Snapshots
Trace the path to the root of the Garbage Collector
Option of seeing first path or multiple paths
In screenshot, we see first five
Understanding the Snapshots Example of object
from “Retained Size”
Only reason this object still exists is because XRTreeFrag has not been GCed.
Understanding the Snapshots
Comparison of two snapshots (users vs. no users)
See that XRTreeFrag retains number of objects
Understanding the Snapshots
Also comparison of (users vs. no users)
See that UserInstance gets garbage collected, as does ChannelStaticData, etc.
Incremental Updates
In order to determine the impact of changes to the uPortal framework, we’ve adopted an incremental update approach.
We apply one “fix” at a time, and monitor its impact.
Incremental Updates
Currently in production… Threadpool switch from homegrown
to Backport Concurrent Finalizer in UBC_Webmail
In the queue… Update to AuthorizationImpl
What’s Happening Today
Recently, flurry of activity on JASIG-DEV list about memory issues. Backport Concurrent Threadpool AuthorizationImpl Finalizers in UBC_Webmail
What’s Happening Today
Backport Concurrent Thread Library Issues with current threadpool
Potential for deadlock or infinite loop Potential for cleanup to fail in thread
workers UnboundedThreadpool that extends
BoundedThreadpool
What’s Happening Today Backport Concurrent Thread Library (cont)
Action Item Aaron wrote patch against HEAD to replace thread
library Rutgers manually applied patch to 2.4.1 and placed
into production. Result:
Undetermined: Most students were on Spring Break Preliminary results indicate may offer performance
benefit rather than memory leak fix
What’s Happening Today
AuthorizationImpl Current Issues
Retaining references to principals No explicit removal of principal from
cache Copying of map on each newPrincipal call
that results in a new principal
What’s Happening Today
AuthorizationImpl Action Item
Rutgers volunteered to provide fix for HEAD Fix consists of replacing current newPrincipal
method and replacing HashMap with a cache Patch is scheduled to be loadtested and
placed into production Patch is scheduled to be committed to uPortal
HEAD on successful test and deployment
What’s Happening Today
AuthorizationImpl Consequences of Changes
Introduced a CacheFactory Not specific to any one part of uPortal CacheFactory is interface (plug your own in!) Default CacheFactory using WhirlyCache
Allows for declaring cache settings and policy in XML
Allows for fine-grained caching strategies for each part of uPortal
What’s Happening Today
UBC_Webmail Issue
Finalizers are not properly cleaning up Action Item
Rutgers has volunteered to refactor Finalizers
Continuing the Search…
Rutgers, and other members of the uPortal community continue to search for the answer to the memory leaks
What can we do to help? Finalizer should be a last resort If a viable open source project exists
that fills the requirements, consider using that
Be aware of proper caching (where its needed vs. where its not needed, weak & soft references, etc.)
Avoid circular references wherever possible
The End (finally!)
Any questions, comments, concerns?