I have been tracking an issue that causes the MQTT data source to disconnect from its broker.
What I have been seeing in the ma.log is:
WARN 2020-02-21T08:38:12,594 (com.serotonin.m2m2.rt.dataSource.PollingDataSource.incrementUnsuccessfulPolls:150) - Data Source Application Virtual aborted 3093 polls since it started.
WARN 2020-02-21T08:45:15,170 (com.serotonin.m2m2.rt.dataSource.PollingDataSource.incrementUnsuccessfulPolls:150) - Data Source Mango Internal aborted 1494 polls since it started.
WARN 2020-02-21T08:45:15,191 (com.serotonin.m2m2.rt.dataSource.PollingDataSource.incrementUnsuccessfulPolls:150) - Data Source Internal Stats aborted 22 polls since it started.
ERROR 2020-02-21T08:45:15,249 (com.infiniteautomation.m2m2.mqtt.client.MqttClientDataSourceRT$1.connectionLost:632) - MQTT Connection Lost
org.eclipse.paho.client.mqttv3.MqttException: Connection lost
at org.eclipse.paho.client.mqttv3.internal.CommsSender.handleRunException(CommsSender.java:153) ~[?:?]
at org.eclipse.paho.client.mqttv3.internal.CommsSender.run(CommsSender.java:138) ~[?:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[?:1.8.0_212]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) ~[?:1.8.0_212]
at java.net.SocketOutputStream.write(SocketOutputStream.java:155) ~[?:1.8.0_212]
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_212]
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_212]
at org.eclipse.paho.client.mqttv3.internal.wire.MqttOutputStream.flush(MqttOutputStream.java:49) ~[?:?]
at org.eclipse.paho.client.mqttv3.internal.CommsSender.run(CommsSender.java:107) ~[?:?]
... 1 more
These errors seemed to occur on a fairly regular timing of about 3.5 hours. It wasn't until yesterday when I got the Mango internal source working that I saw that it was very regular.
The log entries occur when the free memory is "reset" back to 1.2gb. The one anomaly in the middle is when the backup/purge occurs, at which time there are log entries regarding the aborted polling the mqtt lost connection.
I have 2 environment setup with a very similar load:
VM's with 2GB memory allocated, 4GB swap
JVM: ~80% memory usage, 0-15% CPU usage
JAVAOPTS="$JAVAOPTS -Xms1600m -Xmx1600m"
1850 data points
The dev environment, running v3.5.6, has a very different free memory usage. Other than log entries occurring at the time of backup/purge, the mqtt connection has not been lost.
(You can see the influence of the backup/purge also in this chart)
I am concerned with the mqtt connection being lost as it is possible that data will be missed. This occurred once during a backup when the connection was lost for 10mins.
Is this issue due to the limited resources (memory) on the VM? I am not convinced it is as the dev environment does not show a similar behaviour. Is this issue due to v3.6.5 (compared to v3.5.6)?
Is this behaviour expected during a backup/purge? Can it be mitigated?
Any ideas would be great.
PS: as another comparison, I have a similar app running on a physical box with 32gb memory, JVM size of 2.8GB and 2300 data points and have not seen this behaviour (mqtt connection lost) v3.5.6.