The good news: I finally migrated the data. I think.
The bad news: I still don't know why java crashes
In frustration, I finally ran the migration with 1 low-priority thread, and in about 15 minutes, it finally completed — sort of.
I tried updating java to 1.8.0_152; no effect
Added 4GB swap (in addition to the 1GB already configured): no effect
I executed the migration on a VirtualBox VM: no problem, success in about 3 minutes
Finally, migrated on the server in question with 1 low-priority thread, It executed for 923 seconds — then stopped. I feared it had crashed again, but I noticed mango was still alive, so I checked a few things.
That's when it got interesting.
The migration page was seemingly stalled at 19648161 of 38879054
0_1513389336077_migrate-success.png
But in the log file, it claimed to have successfully completed:
INFO 2017-12-15T16:03:36,024 (com.infiniteautomation.nosql.maint.MangoNoSqlMigrationWorkItem.execute:70) - Starting Data Migration, please wait...
INFO 2017-12-15T16:18:59,072 (com.infiniteautomation.nosql.maint.MangoNoSqlMigrationWorkItem.execute:151) - Migrated 19648161 point values.
INFO 2017-12-15T16:18:59,072 (com.infiniteautomation.nosql.maint.MangoNoSqlMigrationWorkItem.execute:155) - Finished Data Migration, 19648161 point values migrated to Time Series Datastore took 923.048s
Eventually my eye caught that the 19648161 was (approximately) the same number I had seen in my successful simulation on the VirtualBox instance:
0_1513391609589_migrate-aldarch.png
But a quick query on the original db shows:
MariaDB [mango]> SELECT COUNT(*) FROM pointValues;
+----------+
| COUNT(*) |
+----------+
| 38879054 |
+----------+
Curiously, the “successful” migrations show about half the total. I also notice (especially on the actual server) that the total changes a few times during the migration.
So please help me understand: did the data migrate? Or only half? What am I not understanding?
The crashing seems obvious to be a java problem, but I can't glean any diagnostic information to pursue further. Note that this instance is running on an Arm architecture, so that might be a factor as well.