UPortal Performance & Memory Issues Scott Battaglia [email protected] Rutgers, the State...

36
uPortal Performance & Memory Issues Scott Battaglia [email protected] Rutgers, the State University of New Jersey

Transcript of UPortal Performance & Memory Issues Scott Battaglia [email protected] Rutgers, the State...

Page 1: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

uPortal Performance & Memory Issues

Scott [email protected]

Rutgers, the State University of New Jersey

Page 2: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Description of Problem Amount of memory consumed by

uPortal grows consistently Continues to consume memory

until there is no memory left Application stops working properly

and hangs Consistent with definition of a

memory leak

Page 3: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Background

Launched myRutgers on uPortal 2.3

Issue was not seen in our QA Seeing issue in production since

November 2004

Page 4: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Background

Also seen in production by: Yale University University of Louisiana at Lafayette University of California at Irvine Cornell University

Page 5: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Temporary Workaround

Monitor memory usage of uPortal When memory drops below 5%

bounce JVM.

Page 6: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Issues with Workaround

May be too aggressive In some cases, JVM may be able to

garbage collect Causes users on that JVM to lose

their session If miss window of opportunity to

restart, can take down Apache also

Page 7: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Issues with Workaround

Ultimately, does nothing to resolve memory issue.

Just makes it barely livable

Page 8: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

History of Fixes Removed caching of IPersons from

PersonDirectory CError and CSecureInfo now pass events to

wrapped channels. Restrict access to ChannelFactory’s channel

cache, synchronized instantiateChannel method.

Guest sessions created on time out AbstractMultithreadedChannels were not

cleaning out their channel state maps (2 of them).

Page 9: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

But….

3 Months later, issue still exists. Previous steps solved memory

leaks but still more exist. The search continues…

Page 10: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today Renewed effort to search for

memory leaks Initial Steps taken:

Retooling of Load Tests Production Snapshots Incremental Updates Re-affirming that loadtest system

matches production system

Page 11: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Retooling of Load Tests

Attempt to mimic more closely what a user does in production. More custom layouts Less people logging out Hitting more popular channels more

aggressively

Page 12: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Retooling of Load Tests

Attempt to accomplish same throughput Determine average user session

length Determine rate at which users access

system

Page 13: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Retooling of Load Tests

Bought test system with same specs/setup as production systems

Ensure database optimizations are the same

Ensure uPortal configuration is the same (i.e. StatsRecorder)

Page 14: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Production Snapshots

Only seeing issue in production Need to capture production

snapshots JVM Heap Size initially set at 2 GB

Page 15: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Production Snapshots

Lowered JVM Heap Size to 128 MB on machine Allows us to compare snapshots

When memory reaches 10% take it out of load balancing rotation

Garbage Collect

Page 16: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Production Snapshots

Capture snapshot Wait past session timeout

Currently set at 15 minutes Garbage Collect again Take new snapshot Analyze Snapshot

Page 17: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Production Snapshots

What do they tell us? They help us determine what objects

are still in memory Tells us how much memory they are

using Tells us how much memory items

they reference are using

Page 18: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Understanding the Snapshots

Use YourKit Java Profiler to capture memory snapshots

YourKit consists of two parts: Component that runs on server Local application to open memory

snapshots

Page 19: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Understanding the Snapshots YourKit tells us:

Reports incoming and outgoing references

Totals for objects of each type How much memory they consume Allows us to compare snapshots,

showing the deltas of each object type. uPortal community has about 20

licenses for YourKit

Page 20: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Understanding the Snapshots Name Objects Shallow Size Retained Size

Page 21: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Understanding the Snapshots

Trace the path to the root of the Garbage Collector

Option of seeing first path or multiple paths

In screenshot, we see first five

Page 22: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Understanding the Snapshots Example of object

from “Retained Size”

Only reason this object still exists is because XRTreeFrag has not been GCed.

Page 23: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Understanding the Snapshots

Comparison of two snapshots (users vs. no users)

See that XRTreeFrag retains number of objects

Page 24: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Understanding the Snapshots

Also comparison of (users vs. no users)

See that UserInstance gets garbage collected, as does ChannelStaticData, etc.

Page 25: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Incremental Updates

In order to determine the impact of changes to the uPortal framework, we’ve adopted an incremental update approach.

We apply one “fix” at a time, and monitor its impact.

Page 26: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Incremental Updates

Currently in production… Threadpool switch from homegrown

to Backport Concurrent Finalizer in UBC_Webmail

In the queue… Update to AuthorizationImpl

Page 27: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today

Recently, flurry of activity on JASIG-DEV list about memory issues. Backport Concurrent Threadpool AuthorizationImpl Finalizers in UBC_Webmail

Page 28: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today

Backport Concurrent Thread Library Issues with current threadpool

Potential for deadlock or infinite loop Potential for cleanup to fail in thread

workers UnboundedThreadpool that extends

BoundedThreadpool

Page 29: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today Backport Concurrent Thread Library (cont)

Action Item Aaron wrote patch against HEAD to replace thread

library Rutgers manually applied patch to 2.4.1 and placed

into production. Result:

Undetermined: Most students were on Spring Break Preliminary results indicate may offer performance

benefit rather than memory leak fix

Page 30: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today

AuthorizationImpl Current Issues

Retaining references to principals No explicit removal of principal from

cache Copying of map on each newPrincipal call

that results in a new principal

Page 31: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today

AuthorizationImpl Action Item

Rutgers volunteered to provide fix for HEAD Fix consists of replacing current newPrincipal

method and replacing HashMap with a cache Patch is scheduled to be loadtested and

placed into production Patch is scheduled to be committed to uPortal

HEAD on successful test and deployment

Page 32: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today

AuthorizationImpl Consequences of Changes

Introduced a CacheFactory Not specific to any one part of uPortal CacheFactory is interface (plug your own in!) Default CacheFactory using WhirlyCache

Allows for declaring cache settings and policy in XML

Allows for fine-grained caching strategies for each part of uPortal

Page 33: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What’s Happening Today

UBC_Webmail Issue

Finalizers are not properly cleaning up Action Item

Rutgers has volunteered to refactor Finalizers

Page 34: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

Continuing the Search…

Rutgers, and other members of the uPortal community continue to search for the answer to the memory leaks

Page 35: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

What can we do to help? Finalizer should be a last resort If a viable open source project exists

that fills the requirements, consider using that

Be aware of proper caching (where its needed vs. where its not needed, weak & soft references, etc.)

Avoid circular references wherever possible

Page 36: UPortal Performance & Memory Issues Scott Battaglia scott_battaglia@rutgers.edu Rutgers, the State University of New Jersey.

The End (finally!)

Any questions, comments, concerns?