I have been monitoring Mango's memory consumption over the past week and we recently had it stop responding on us. This is the second time this has occurred in the past two weeks. It appears that it generally takes about a week for it to get to the point of not responding.
Here are some graphs of the memory consumption and REST and login page response times.
The load average is coming from Mango's internal monitoring and you can see spikes every night at 1am when we have the backup scheduled.
The % memory used is from the system but according to
top, java is the biggest memory consumer. As of writing this it is responsible for 48% of memory usage and 195% CPU load. The % memory used started around 45% about two weeks ago and plateaued a bit above 70%. When it became unresponsive to web requests today, I restarted the server, but memory consumption nearly immediately returned to near 70%.
The Local REST API response times and HTTPS ui/login response times are from localhost and go through a locally runny apache proxy which serves up the SSL certs.
We are running this on an Amazon EC2 t2.medium with 4G of memory and 2 CPU cores.
There are a ton of errors in our logs which can be viewed in their entirety here: https://gist.githubusercontent.com/anonymous/bef071adf061ec4e63f0fe2d3b8fa854/raw/540bcaa9aac3abdfd33ab7ff003b24165d622862/mango-logs
I admit I am a tad overwhelmed by the amount of stuff in the logs and I find most of it difficult to understand not knowing the internals of mango. Some errors of note though:
ERROR 2018-02-26T21:15:07,109 (com.infiniteautomation.nosql.MangoNoSqlBatchWriteBehindManager$StatusProvider.scheduleTimeout:731) - 3 BWB Task Failures, first is: Task Queue Full
Feb 27 16:20:58 Nusak.NUS.iA3.io ma.sh: Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated
BTW we are using oracle java (not openjdk) on Arch Linux.