Plumbr case study
-
Upload
nikita-salnikov-tarnovski -
Category
Documents
-
view
218 -
download
0
Transcript of Plumbr case study
![Page 1: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/1.jpg)
TECHNICAL OBSTACLES WHEN BUILDING PLUMBR
Nikita Salnikov-Tarnovski
Monday, April 1, 13
![Page 2: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/2.jpg)
AGENDA
Who we were and who we are
Object lifecycle with little overhead
Graph analysis in low memory
The problem of quitting
Monday, April 1, 13
![Page 3: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/3.jpg)
OUR BACKGROUND
2 developers
Nikita Salnikov-Tarnovski, @iNikem
Vladimir Šor, @vovencij
10+ years in custom software house Nortal
Mostly Java EE development
Web sites, backend systems, batch processes
Monday, April 1, 13
![Page 4: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/4.jpg)
NEW PROBLEM
Memory leaks
130,000 monthly searches for OutOfMemoryError in Google
20,000 monthly unique visitors on our site
http://plumbr.eu
400 monthly downloads
1700+ leaks discovered
Monday, April 1, 13
![Page 5: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/5.jpg)
PLUMBR
Automated performance consultant
Giving you the exact location of the leak with enough information to fix it
The foundation is based on machine learning
trained on 500,000 memory snapshots
From 3,000 different applications
Finding 88% of the existing leaks.
Quality only going up with the additional data gathered each day.
Monday, April 1, 13
![Page 6: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/6.jpg)
PLUMBR AGENT...
JVM TI agents
both java and native, OS specific
welcome malloc and free!
JNI code for communication between them
Monday, April 1, 13
![Page 7: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/7.jpg)
... WATCHES YOU
We monitor object creation and disposal
On-the-fly bytecode instrumentation
Hooks into GC events
Monday, April 1, 13
![Page 8: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/8.jpg)
OBJECT MONITORING I
Java agent registers java.lang.instrument.ClassFileTransformer
Modifies bytecode as classes are loaded
Using ASM library
To capture all newly created objects
Monday, April 1, 13
![Page 9: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/9.jpg)
PROBLEMS
Different compilers produce slightly different bytecode
Some classes are too fragile or broken already
new and chain of <init>
Clone, deserialization, reflection
Monday, April 1, 13
![Page 10: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/10.jpg)
OBJECT MONITORING II
We keep some data about each live object
That data creation and association takes time
On every object creation!
Monday, April 1, 13
![Page 11: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/11.jpg)
OBJECT MONITORING II
If you cannot do in-process, do it off-process
Monday, April 1, 13
![Page 12: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/12.jpg)
PROBLEMS
BlockingQueue are slow
Locks are slow
Atomic* are slow!
No existing library
Even Disruptor doesn’t suite
We’ve written no-guarantee-lock-free-many-producers-one-consumer buffer
Concurrent programming IS hard
Monday, April 1, 13
![Page 13: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/13.jpg)
MORE PROBLEMS
Have to store all that objects related data somewhere
Java Collections are too fat
No lock-free thread-safe reading
We use Trove to save memory
Hand-written clone with dirty check
Testing persistent immutable data structures
Monday, April 1, 13
![Page 14: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/14.jpg)
LEAK HUNTING
When leaks are detected we need to find out, who is holding them
Paths to GC roots
While application is still running
Monday, April 1, 13
![Page 15: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/15.jpg)
PROBLEMS
Java objects have no incoming refs
You can walk the heap in C code
But that stops the world
Standard heap dump loses information
So we make custom heap dump
And traverse reference graph on it
Monday, April 1, 13
![Page 16: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/16.jpg)
STILL PROBLEMS
We’ve tried many graph traversal libraries
And NoSQL solutions
All somewhat works
If you give them gigs of memory
But we have to do this on-site, while application is still running
We needed memory sensitive solution
Monday, April 1, 13
![Page 17: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/17.jpg)
ONE MORE BICYCLE
We’ve written our own specialized version of Dijkstra path searching
Again had to replace many Java Collections with more memory efficient implementations
Monday, April 1, 13
![Page 18: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/18.jpg)
TIME TO DIE
Plumbr runs inside JVM alongside with an application
It isn’t the main actor, just a supporter
So Plumbr must be ready to quit whenever main application wishes
Monday, April 1, 13
![Page 19: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/19.jpg)
WHEN JVM QUITS
It turns out JVM is quite survivable
No shutdown notification or smth
It just quits when there are no more non-daemon threads
And some threads live for far too long
Monday, April 1, 13
![Page 20: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/20.jpg)
Plumbr’s own threads
Threads from libraries that Plumbr uses
ExecutorService with daemon thread factory
PROBLEMS
Monday, April 1, 13
![Page 21: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/21.jpg)
RMI Reaper Thread
Keeps JVM alive as long as some JMX resources are in use
We must clean behind ourselves, MBeans, JMX connections, JMX servers
But when???
Implemented our own monitor thread with some heuristics
PROBLEMS
Monday, April 1, 13
![Page 22: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/22.jpg)
Earlier versions used some Swing components, e.g. Systray icon
And JVM will not quit while there is some displayable Swing components
Should kill it when before quitting
Again, when???
PROBLEMS
Monday, April 1, 13
![Page 23: Plumbr case study](https://reader033.fdocuments.in/reader033/viewer/2022052509/55d6ee68bb61eb942d8b457a/html5/thumbnails/23.jpg)
Don’t spend all your time writing web components or web-services or Swing
There is more to Java than that
There are many Java libraries but not enough
CONCLUSION
Monday, April 1, 13