So, it seems that the memory usage chart from my last post is "normal" for the JVM, and I have misinterpreted the large upward spikes of memory as a restart when it is, in fact, normal gc behavior. Would you agree with this assessment?
For the most part, yes, but in a chart of the internal memory free data point it can appear the same, since both a reset and a full gc will spike back to lots of free memory. If there was any horizontal pause in the chart we could reason the point has no data there, so that could be a restart, but that's probably an unnecessary level inference when the crash times should be recorded in ma.log, ma-script.log and possibly /var/log/messages and the hs_err file if it was a JVM crash.
These are MangoES devices, and all configuration was done through the Mango UI. I am unsure what other processes can be affecting OS memory usage that would not have been captured in a database copy. What OS information would be useful in troubleshooting other processes?
The script in /opt/mango/bin/ext-enabled sets the memory limits, and that information is not stored in the database so it does not move over in a copy. I would expect the two ES's have the same allocation in the memory-small.sh script in that directory, but you could check. I believe the OOM Killer message in dmesg or /var/log/messages records the process that made the fatal allocation. If Java requested the memory and Java got killed, then we can be confident nothing else on the system is involved.
Would it be a reasonable assessment to attribute the Jan21 5am spike in free memory (original post) to gc operations? There was no one working on the system at that time, and memory use continued in the same pattern.
I was guessing that to be a crash because it didn't run low (< 100MB remaining) on memory like your other charts have before doing the full gc. I can't say for sure which it is from the chart alone.
If the issue has not been replicated, the only other step in troubleshooting I can consider is getting a MangoES v3 to load the backup databases. I'm at a loss for ideas here at this point because we are running on a device pre-configured for the mango service. Do you have any ideas on what else could be running?
I would guess your deployed V3 ES is on JDK8 revision 131. The memory settings in the ext-enabled script could be checked. The actual board does differ between the V2 and the V3, but the V3 is more powerful.
After examining the files you sent me, I think one of the differences is the size of the data history available to those Meta points.
Based on your last post, the Java process itself does not appear to be the root cause for the behavior. Does that make sense or am I jumping to conclusions?
I am not aware of any memory leaks in the version I would expect to be on the deployed device (but funnily enough there is a memory leak in the version you were testing, 1.8.0_33 they fixed in 102 or thereabouts!). Updating to the latest JDK8 isn't a bad idea.