A quick howto on using agentlib HPROF and verbose GC logging to try and track down where unwanted objects are allocated but not freed.
The VM switches to use are:
# Track heap usage
# CAVEAT: hprof can make the VM use a lot of memory storing the object graph
-agentlib:hprof=heap=all
# CAVEAT: hprof can make the VM use a lot of memory storing the object graph
-agentlib:hprof=heap=all
and
# Verbose garbage collection (log to file gc.log)
-verbose:gc -Xloggc:gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-verbose:gc -Xloggc:gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
So a script to start the VM with logging and capture the process ID is (on Linux):
#!/bin/sh
nohup java -agentlib:hprof=heap=all -verbose:gc -Xloggc:gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails MyClass &
pid=$!
echo ${pid} > java.pid
nohup java -agentlib:hprof=heap=all -verbose:gc -Xloggc:gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails MyClass &
pid=$!
echo ${pid} > java.pid
Now you can view the gc log:
tail -f gc.log
3.577: [GC [PSYoungGen: 24000K->3936K(27968K)] 24000K->6406K(91840K), 0.0417580 secs] [Times: user=0.06 sys=0.00, real=0.05 secs]
8.217: [GC [PSYoungGen: 27936K->3937K(51968K)] 30406K->12116K(115840K), 0.0937310 secs] [Times: user=0.11 sys=0.02, real=0.10 secs]
26.146: [GC [PSYoungGen: 51937K->3936K(51968K)] 60116K->13835K(115840K), 0.4250640 secs] [Times: user=0.43 sys=0.01, real=0.43 secs]
46.128: [GC [PSYoungGen: 51936K->3968K(99968K)] 61835K->14874K(163840K), 0.4770580 secs] [Times: user=0.48 sys=0.01, real=0.47 secs]
85.577: [GC [PSYoungGen: 99968K->3936K(99968K)] 110874K->16515K(163840K), 0.9341310 secs] [Times: user=0.94 sys=0.01, real=0.93 secs]
125.424: [GC [PSYoungGen: 99936K->5540K(201280K)] 112515K->18118K(265152K), 0.9266500 secs] [Times: user=0.94 sys=0.00, real=0.93 secs]
204.551: [GC [PSYoungGen: 197540K->6168K(201600K)] 210118K->20463K(265472K), 2.0211840 secs] [Times: user=1.99 sys=0.05, real=2.02 secs]
284.787: [GC [PSYoungGen: 198168K->7577K(393024K)] 212463K->22560K(456896K), 1.9697120 secs] [Times: user=1.91 sys=0.07, real=1.97 secs]
443.684: [GC [PSYoungGen: 391385K->8892K(393216K)] 406368K->25936K(457088K), 4.3958790 secs] [Times: user=4.18 sys=0.23, real=4.39 secs]
605.846: [GC [PSYoungGen: 392700K->7582K(624192K)] 409744K->28319K(688064K), 4.4669550 secs] [Times: user=4.24 sys=0.24, real=4.47 secs]
873.443: [GC [PSYoungGen: 621726K->10040K(624192K)] 642463K->31881K(688064K), 7.0002940 secs]
3.577: [GC [PSYoungGen: 24000K->3936K(27968K)] 24000K->6406K(91840K), 0.0417580 secs] [Times: user=0.06 sys=0.00, real=0.05 secs]
8.217: [GC [PSYoungGen: 27936K->3937K(51968K)] 30406K->12116K(115840K), 0.0937310 secs] [Times: user=0.11 sys=0.02, real=0.10 secs]
26.146: [GC [PSYoungGen: 51937K->3936K(51968K)] 60116K->13835K(115840K), 0.4250640 secs] [Times: user=0.43 sys=0.01, real=0.43 secs]
46.128: [GC [PSYoungGen: 51936K->3968K(99968K)] 61835K->14874K(163840K), 0.4770580 secs] [Times: user=0.48 sys=0.01, real=0.47 secs]
85.577: [GC [PSYoungGen: 99968K->3936K(99968K)] 110874K->16515K(163840K), 0.9341310 secs] [Times: user=0.94 sys=0.01, real=0.93 secs]
125.424: [GC [PSYoungGen: 99936K->5540K(201280K)] 112515K->18118K(265152K), 0.9266500 secs] [Times: user=0.94 sys=0.00, real=0.93 secs]
204.551: [GC [PSYoungGen: 197540K->6168K(201600K)] 210118K->20463K(265472K), 2.0211840 secs] [Times: user=1.99 sys=0.05, real=2.02 secs]
284.787: [GC [PSYoungGen: 198168K->7577K(393024K)] 212463K->22560K(456896K), 1.9697120 secs] [Times: user=1.91 sys=0.07, real=1.97 secs]
443.684: [GC [PSYoungGen: 391385K->8892K(393216K)] 406368K->25936K(457088K), 4.3958790 secs] [Times: user=4.18 sys=0.23, real=4.39 secs]
605.846: [GC [PSYoungGen: 392700K->7582K(624192K)] 409744K->28319K(688064K), 4.4669550 secs] [Times: user=4.24 sys=0.24, real=4.47 secs]
873.443: [GC [PSYoungGen: 621726K->10040K(624192K)] 642463K->31881K(688064K), 7.0002940 secs]
Note the time taken to perform GC is increasing.
Now send a QUIT signal to the Java process ID (found in java.pid) to make the VM write a heap dump:
more java.pid
21821
kill -QUIT 21821
21821
kill -QUIT 21821
The heap dump is written to java.hprof.txt (wait until the .TMP version disappears to know the dump is finished).
Now show the end of the dump file to get the table of object use:
tail -n 200 java.hprof.txt
# Look for this part
HEAP DUMP END
SITES BEGIN (ordered by live bytes) Tue Jul 3 12:41:24 2012
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 8.93% 8.93% 9125744 233329 187262488 4821805 308067 byte[]
2 7.10% 16.03% 7259144 233330 148688128 4821808 308058 byte[]
3 5.48% 21.51% 5599896 233329 115723392 4821808 308056 byte[]
4 5.48% 26.99% 5599896 233329 115723392 4821808 308057 byte[]
5 5.48% 32.47% 5599896 233329 115723320 4821805 308068 byte[]
6 5.48% 37.95% 5599896 233329 115723320 4821805 308069 byte[]
7 3.67% 41.62% 3749872 92336 75387584 1851930 300018 char[]
# Look for this part
HEAP DUMP END
SITES BEGIN (ordered by live bytes) Tue Jul 3 12:41:24 2012
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 8.93% 8.93% 9125744 233329 187262488 4821805 308067 byte[]
2 7.10% 16.03% 7259144 233330 148688128 4821808 308058 byte[]
3 5.48% 21.51% 5599896 233329 115723392 4821808 308056 byte[]
4 5.48% 26.99% 5599896 233329 115723392 4821808 308057 byte[]
5 5.48% 32.47% 5599896 233329 115723320 4821805 308068 byte[]
6 5.48% 37.95% 5599896 233329 115723320 4821805 308069 byte[]
7 3.67% 41.62% 3749872 92336 75387584 1851930 300018 char[]
Now find the traces responsible for the biggest allocations:
grep "TRACE 308067" java.hprof.txt --after-context=10
TRACE 308067:
com.foo.bar.process(AbstractFoo.java:204)
com.foo.bar.FooServer.startFeed(FooServer.java:188)
com.foo.bar.FooServer.main(FooServer.java:391)
TRACE 308067:
com.foo.bar.process(AbstractFoo.java:204)
com.foo.bar.FooServer.startFeed(FooServer.java:188)
com.foo.bar.FooServer.main(FooServer.java:391)
You can also check nohup.out to get a breakdown of the various allocation spaces within the VM:
more nohup.out
Heap
PSYoungGen total 624192K, used 81745K [0x00000007aaab0000, 0x00000007d1610000, 0x0000000800000000)
eden space 614144K, 12% used [0x00000007aaab0000,0x00000007af31cc10,0x00000007d0270000)
from space 10048K, 75% used [0x00000007d0c40000,0x00000007d13a7840,0x00000007d1610000)
to space 10048K, 0% used [0x00000007d0270000,0x00000007d0270000,0x00000007d0c40000)
ParOldGen total 63872K, used 20737K [0x0000000700000000, 0x0000000703e60000, 0x00000007aaab0000)
object space 63872K, 32% used [0x0000000700000000,0x0000000701440750,0x0000000703e60000)
PSPermGen total 21248K, used 10417K [0x00000006fae00000, 0x00000006fc2c0000, 0x0000000700000000)
object space 21248K, 49% used [0x00000006fae00000,0x00000006fb82c6d0,0x00000006fc2c0000)
Dumping Java heap ... allocation sites ... done.
Heap
PSYoungGen total 624192K, used 81745K [0x00000007aaab0000, 0x00000007d1610000, 0x0000000800000000)
eden space 614144K, 12% used [0x00000007aaab0000,0x00000007af31cc10,0x00000007d0270000)
from space 10048K, 75% used [0x00000007d0c40000,0x00000007d13a7840,0x00000007d1610000)
to space 10048K, 0% used [0x00000007d0270000,0x00000007d0270000,0x00000007d0c40000)
ParOldGen total 63872K, used 20737K [0x0000000700000000, 0x0000000703e60000, 0x00000007aaab0000)
object space 63872K, 32% used [0x0000000700000000,0x0000000701440750,0x0000000703e60000)
PSPermGen total 21248K, used 10417K [0x00000006fae00000, 0x00000006fc2c0000, 0x0000000700000000)
object space 21248K, 49% used [0x00000006fae00000,0x00000006fb82c6d0,0x00000006fc2c0000)
Dumping Java heap ... allocation sites ... done.
For an excellent explanation of how the VM's memory spaces are structured please visit http://sdoulger.blogspot.co.uk/2011/03/excellent-post-from-mark-nelson.html