100% CPU after user login attempt

Brad_GMI

Our central server is having an issue where, when a user attempts to login, CPU utilization grows until it reaches 100%.

Occasionally, a user will be authorized, but nothing will load.

This is one entry from a recent log that seems related:

ERROR 2018-07-30 10:37:48,116 (com.serotonin.m2m2.rt.RuntimeManager.stopDataSourceShutdown:413) - Data source 'Zirkle Facility D' failed proper termination.
org.springframework.dao.CannotAcquireLockException: PreparedStatementCallback; SQL [SELECT dp.id,dp.data,dp.xid,dp.dataSourceId,dp.name,dp.deviceName,dp.enabled,dp.pointFolderId,dp.loggingType,dp.intervalLoggingPeriodType,dp.intervalLog$
SELECT dp.id,dp.data,dp.xid,dp.dataSourceId,dp.name,dp.deviceName,dp.enabled,dp.pointFolderId,dp.loggingType,dp.intervalLoggingPeriodType,dp.intervalLoggingPeriod,dp.intervalLoggingType,dp.tolerance,dp.purgeOverride,dp.purgeType,dp.purg$
SELECT dp.id,dp.data,dp.xid,dp.dataSourceId,dp.name,dp.deviceName,dp.enabled,dp.pointFolderId,dp.loggingType,dp.intervalLoggingPeriodType,dp.intervalLoggingPeriod,dp.intervalLoggingType,dp.tolerance,dp.purgeOverride,dp.purgeType,dp.purg$
        at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:259)
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:645)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:722)
        at com.serotonin.db.spring.ExtendedJdbcTemplate.queryForObject(ExtendedJdbcTemplate.java:39)
        at com.serotonin.db.DaoUtils.queryForObject(DaoUtils.java:332)
        at com.serotonin.m2m2.db.dao.AbstractBasicDao.get(AbstractBasicDao.java:549)
        at com.serotonin.m2m2.vo.permission.Permissions.hasEventTypePermission(Permissions.java:248)
        at com.serotonin.m2m2.rt.EventManager.deactivateEvents(EventManager.java:316)
        at com.serotonin.m2m2.rt.EventManager.cancelEventsForDataPoint(EventManager.java:529)
        at com.serotonin.m2m2.rt.dataImage.DataPointRT.terminate(DataPointRT.java:699)
        at com.serotonin.m2m2.rt.RuntimeManager.stopDataPointShutdown(RuntimeManager.java:561)
        at com.serotonin.m2m2.rt.RuntimeManager.stopDataSourceShutdown(RuntimeManager.java:402)
        at com.serotonin.m2m2.rt.DataSourceGroupTerminator$DataSourceSubGroupTerminator.run(DataSourceGroupTerminator.java:141)
        at com.serotonin.timer.TimerTask.runTask(TimerTask.java:148)
        at com.serotonin.timer.TimerTask.run(TimerTask.java:119)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table "DATAPOINTS"; SQL statement:
SELECT dp.id,dp.data,dp.xid,dp.dataSourceId,dp.name,dp.deviceName,dp.enabled,dp.pointFolderId,dp.loggingType,dp.intervalLoggingPeriodType,dp.intervalLoggingPeriod,dp.intervalLoggingType,dp.tolerance,dp.purgeOverride,dp.purgeType,dp.purg$
        at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
        at org.h2.message.DbException.get(DbException.java:179)
        at org.h2.message.DbException.get(DbException.java:155)
        at org.h2.table.RegularTable.doLock1(RegularTable.java:504)
        at org.h2.table.RegularTable.lock(RegularTable.java:469)
        at org.h2.table.TableFilter.lock(TableFilter.java:146)
        at org.h2.command.dml.Select.queryWithoutCache(Select.java:630)
        at org.h2.command.dml.Query.query(Query.java:322)
        at org.h2.command.dml.Query.query(Query.java:290)
        at org.h2.command.dml.Query.query(Query.java:36)
        at org.h2.command.CommandContainer.query(CommandContainer.java:90)
        at org.h2.command.Command.executeQuery(Command.java:197)
        at org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:108)
        at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:688)
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:629)
        ... 18 more

Any ideas?

phildunlap

Hi Brad,

How large is your Mango/databases/mah2.h2.db file? My guess is that some tables have gotten large, or the database has itself grown very large, and purging those tables then going through a backup / restore cycle via,

SQL backup now (system settings page)
Shutdown Mango, move existing Mango/databases/mah2.h2.db file for safe keeping
Start Mango
Do an SQL restore from the backup in part 1

will shrink the database and make it run somewhat faster. If you have an automatic SQL backup and the configuration hasn't changed since it, you can use that. It looks like you copied console output without getting the end of the lines, so I can't read the whole error messages. Did you try loading pages in both the old and new UI? You may need to purge your events and userEvents tables using the Mango/bin/h2-web-console.sh script to help the UI load faster, if you need to get into it.

Edit: After that I would wonder how much it is correlated with user login. If it weren't the database being huge, I would wonder if there is any event handling on user login events added by you. I would what version this is, also

Brad_GMI

This is version 2.8.

The systems resides in an AWS EC2 instance. Currently the mah2.h2.db file is 12 GB.

I don't have any additional event handling that I am aware of.

Currently, there is a lock file that is hindering the web console.

I'm not familiar with the H2 database, so I don't want to get to far away from home. If i move this lock file, I should be able to clean the event table?

phildunlap

Because the H2 database is launched in embedded mode, it's advisable to only have one program using it at a time. So, you should stop Mango to use the web console. The commands to delete from the events, userEvents, and audit tables are:

delete from events;
delete from userEvents;
delete from audit;

Definitely a 12GB database sounds like the issue. You will probably see a major shrinkage just from restoring an existing backup before clearing out those tables, but for best results delete the contents of the tables, make a backup, start up clean and restore it. I've also provided instructions on how to run a restore on the command line, if you wanted to check if a restore of last night's SQL backup would produce a significantly smaller mah2.h2.db file, in this thread: https://forum.infiniteautomation.com/topic/2748/how-to-restore-a-database-backup

Brad_GMI

Thanks for the information. I suspect that will get me going in the right direction.

The lastest piece of the puzzle is the service will not shutdown.

The server stops listening on 8080, but a portion of the persistent connections still remain connected according to netstat.

If I kill the PID, I suspect the lock will not be released?

phildunlap

I don't think you should run into any locked file issues. Kill away.

Brad_GMI

It appears as though I am back in business. This has apparently been going on since July 5.

I restored the backup from July 4 and all appears to be functioning.

The database is sitting at 3 GB now. Any idea what would cause such a dramatic increase?

Thank you for the assistance!

phildunlap

3GB still seems pretty big to me. Do you have the NoSQL module installed?

Any idea what would cause such a dramatic increase?

Recently we discovered that updates to blob columns in the H2 database will expand the database. This was particularly an issue for things with Persistent TCP Publishers that had sync'ing enabled, as unto recently the tracking of how far points had sync'ed was done in a single blob for the whole publisher, which meant an update for every point that sync'ed and thus a continuously expanding database.

So, some ways to deal with that would be converting to MySQL, or updating to the latest Mango where we have mitigated at least that source of growth via move the tracking of sync times into its own, non-blob database table.

At 3GB (if you have NoSQL installed) I would still expect there to be a considerable amount of clutter making that DB large, and you may benefit further from another purge-restore. If you aren't using NoSQL I would just assume that's the pointValues table getting large.

Brad_GMI

This is a enterprise install with NoSQL.

There are currently ~35 Persistent TCP datasources in this install.

I did notice today that the datasource page will not load the datasources, but the dashboard and datapoints pages appear to be displaying the data properly.

I'm going to check an older database and see if that helps.

phildunlap

I have a suspicion the data source page issue would be the one in this thread, with the course of resolution I suspect you will use provided by Jared at the end (unless you are going to upgrade the instance): https://forum.infiniteautomation.com/topic/3551/data-sources-does-not-appear

The problem was more pronounced on the publisher, but I could see how points being updated could cause that. I would really consider moving your installation to MySQL, since you are on 2.8 and MySQL doesn't have the issue.