JVM Performance Monitoring Part 1: Troubleshooting GC logs
Performance and memory issues in ThingWorx often appear at the Java Virtual Machine (JVM) level. We can frequently detect these issues by monitoring JVM garbage collection logs and thread dumps.
In this first of a multi-part series, let's discuss how to analyze JVM garbage collection (GC) logs to detect memory issues in our application.
What are GC logs? When enabled on the Apache server, the logs will show the memory usage over time and any memory allocation issues (See: How to enable GC logging in our KB for details on enabling this logging level). Garbage Collection logs capture all memory cleanup operations performed by the JVM. We can determine the proper memory configuration and whether we have memory issues by analyzing GC logs.
Where do I find the GC logs? When configured as per our KB article, GC logs will be written to your Apache logs folder. Check both the gc.out and the gc.restart files for issues if present, especially if the server was restarted when experiencing problems.
How do I read GC logs? There are a number of free tools for GC log analysis (e.g. http://gceasy.io, GCViewer). But GC logs can also be analyzed manually in a text editor for common issues.
Reading GC files:
Generally each memory cleanup operation is printed like this:
Additional information on specific GC operations you might see are available from Oracle.
A GC analysis tool will convert the text representation of the cleanups into a more easily readable graph:
Three key scenarios indicating memory issues:
Let's look at some common scenarios that might indicate a memory issue. A case should be opened with Technical Support if any of the following are observed:
Extremely long Full GC operations (30 seconds+)
Full GCs occurring in a loop
Full GCs: Full GCs are undesirable as they are a ‘stop-the-world’ operation. The longer the JVM pause, the more impact end users will see:
All other active threads are stopped while the JVM attempts to make more memory available
Full GC take considerable time (sometimes minutes) during which the application will appear unresponsive
Example – This full GC takes 46 seconds during which time all other user activity would be stopped: