Publisher queue discarded - Task Queue Full

phildunlap

Hmm. I do see database issues in the first log I checked. I think you'll see benefits from converting your database to MySQL. Did you try the backup / restore method of shrinking the H2 database?

mihairosu

Okay the Persistent TCP syncs are working again.

Man we really need to get our Grafana up and running again so we can monitor our VMs.....such a simple thing could have been been much earlier.

I'm not sure if this makes a difference, but when I check the Runtime status on the Data Source, sometimes there are no connections, sometimes both are connected and sometimes only one is connected (we have 2 TCP syncs).

mihairosu

So I am running the Backups regularly. Are you saying I should just attempt to restore the latest backup?

Also, I thought H2 was the ideal database for our use case. Why would we consider going with MySQL or MariaDB?

phildunlap

That could definitely result in a smaller mah2 which may alleviate some of the memory strain. I would do that by renaming your existing mah2 (so as not to lose it if something isn't right) with Mango off, then starting clean and running the SQL restore.

The smaller the system, the more it makes sense to weight H2 above MySQL. But, if your database is large, MySQL and MariaDB are both very capable of handling it. H2 should be as well, and as I mentioned in the other thread there were some major improvements in the recent version of H2, which will be bundled in our next release.

mihairosu

This post is deleted!

mihairosu

By the way, I did try backup and restore database on the MangoES and it did not help with the Publisher errors.

For now I have to turn off logging on the publisher.

phildunlap

Whoa, I just took a closer look at your publisher settings, I would try fiddling with that if it's a performance problem on the publisher's side.

If you're using NoSQL on both sides, increase the 'minimum overlap when syncing blocks of data' to 1000.

Regardless of the database setup, lower your sync threads to between 2 and 6 I'd say. I'd probably go with 3, personally.

mihairosu

Okay great, thanks Phil, I will try that.

phildunlap

I would say resolution was found in increasing the data source's timeout from 5000 to 45000. Many other things also transpired, but I believe the source of the issues was the data source timing out during the connection. This then lead to an enormous audit table due to this issue: https://github.com/infiniteautomation/ma-core-public/issues/1188 and that led to a circuitous troubleshooting. Thanks for your patience and letting me look into it!

mihairosu

Wow yea, thank you so much for spending your time troubleshooting our problems.

Everything is running smoothly again, and with the new settings you recommended (such as purging the events at intervals less than 1 year hahah) we should be pretty well set.