@phildunlap
Have you checked your log file and events for any errors in writing?
I did not find any ERRORs, but when I searched for tsdb I found WARNings:
07-14-2019-1.ma.log:WARN 2019-07-14T13:46:10,746 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-12-2019-1.ma.log:WARN 2019-07-12T05:05:10,802 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-12-2019-1.ma.log:WARN 2019-07-12T06:35:10,970 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-12-2019-1.ma.log:WARN 2019-07-12T12:13:51,286 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-12-2019-1.ma.log:WARN 2019-07-12T12:31:48,767 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-12-2019-1.ma.log:WARN 2019-07-12T13:20:10,772 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-12-2019-1.ma.log:WARN 2019-07-12T16:35:10,747 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-12-2019-1.ma.log:WARN 2019-07-12T17:35:11,002 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-13-2019-1.ma.log:WARN 2019-07-13T12:53:03,262 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-13-2019-1.ma.log:WARN 2019-07-13T16:04:19,245 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-14-2019-1.ma.log:WARN 2019-07-14T04:35:10,769 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-14-2019-1.ma.log:WARN 2019-07-14T06:20:10,757 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
07-14-2019-1.ma.log:WARN 2019-07-14T13:46:10,746 (com.infiniteautomation.tsdb.impl.Backdates$BackdatePoster.runImpl:187) - The backdate poster ran without inserting, queue size: 1
I had noticed that message a few weeks ago, after running a Generate History on a metadata point, but I had thought that was related to a large meta point history generation and that it would resolve itself after the task queues caught up when the CPU utilization fell back down. By the time I was noticed the rising values to be written, I had forgotten about the backdate issue. I think the Generate History function may be triggering or exacerbating the problem. However, the values to be written was rising daily long before I started the Generate History. Either way, I don't know what to do about it.
About how fast is it rising?
The rate is irregular when viewed on an hour timescale: the samples to be written may go down every few minutes, but the increases exceed the decreases and a linear trend is seen in the graph. It has been averaging out to an additional 1,700 values per day. Day/1700 ≈ 0.847 min/value ≈ 50.8 sec/value, so not your typical cron job interval. Viewed on a 1-day timescale, the values to be written plot looks like a straight line with a steady rising slope.
System uptime is 3,500 hours (almost 146 days). 3,500h/272,458values = 46.24 seconds/value in the last 146 days. 146 days ago the values to be written was 445,000, rising at 14,000values/day. At restart that day the values to be written reset to zero, then rose at roughly 1,800values/day. That must be the day I last upgraded Mango. If I remember correctly, I had to kill mango many minutes after initiating shutdown because it was taking too long to complete and I was concerned about prolonging a Mango outage.
$ date -d 'now - 146 days'
Tue Feb 19 11:44:35 EST 2019
I poked around for large changes in internal data source readings before and after the update, and today, to see if they correlate with the rate of change in values to be written:
- Backdates running: usually 0 then and now.
- Backdates queued: Usually close to 0; 1 today. Spiked above 100 on June 20. The values to be written rose at a much faster rate around that time, but then the slope resumed to 1,700values/day. This was probably the day I ran the Generate History on a metadata point.
- Consecutive run polls: currently 0; spiked from 0 to 205,000 on June 15, then suddenly down to 0 on June 27.
- Currently scheduled high priority tasks is typically 12,000-18,000. Does not seem to correlate with other issues.
- Point value write threads is currently 3. It was 5 before the Feb 19 restart, and appears to be going up in proportion to the number of values to be written.
- Total backdates is currently 216,293. It had climbed slowly to 1,200 at which time it was reset to 0 at the Feb 19 restart. Since then it climbed slowly to around 3,000 on May 1, at which point it shot up suddenly to almost 60,000 that day, then resumed its slower rate of increase. On June 21 it shot up suddenly from about 60,000 to 215,000, then resumed its slower increase rate.
I believe the total backdates shot up when I ran a Generate History, and the system has not caught up with the backlog. This is odd because CPU utilization is usually more than 80% idle and CPU % IO wait is currently 0. If I had not ran the Generate History, it seems that these numbers would still be increasing indefinitely, but not at such a fast rate.