Internal data source attribute 'Point values to be written' keeps climbing. Mitigation strategies?
-
Thank you for going into such detail! It may take me a couple days to test things / read around to see if I can reproduce what you're reporting, but I just wanted to say thanks for so much information!
Either way, I don't know what to do about it.
If it's just the count getting off and not values accruing in memory, it's not a big problem. it won't interfere with writing the data that is actually queued to get written, and so there would not be an increasing amount of memory allocated to this task, just a number that doesn't reflect the accurate size of the queue (the values waiting aren't counted in the queue, they tracked, because it's much more efficient not to count the large queue).
-
@phildunlap said in Internal data source attribute 'Point values to be written' keeps climbing. Mitigation strategies?:
If it's just the count getting off and not values accruing in memory, it's not a big problem.
Thanks. Hopefully that's the case, but I think there are accrued values hiding somewhere, as that would explain the prolonged shutdowns I experienced when I restarted Mango, whereby I eventually had to kill Mango. Accrued values aside, I need to be able to generate history after creating or changing metadata points, as they are of limited value without a history. I'm thinking that whatever is underlying this symptom is also affecting my ability to generate metadata value histories.
Sometimes when I generate history I'm wondering if I'm going to push Mango over the edge, where excessive resource demands cause a positive feedback loop of error messages that demand more resources, eventually requiring a restart for Mango to catch up. This occurred a year or two ago, but I can't recall whether it was a history generation or an Internet outage (and inability to send email alarms) that kicked off the positive feedback loop.
It would be nice to have automatic ad-hoc metadata points as a feature enhancement: if the metadata value is not found in the TSDB point log (e.g. because it was not logged), then it should be calculated at the time it is retrieved. This would result in calculating historical values only when the data is being retrieved, thus reducing computational load and the storage demand, yet still making it available for review or download on the occasion where the data is needed for analysis. Since the value would be calculated upon retrieval, values that are never retrieved would not have to be calculated or recorded. The values could optionally be logged when they are calculated, as they are now. Live values would still be displayed. Should I submit this feature request on Github?
-
@Pedro I was just reviewing this and had a few questions:
-
Are you seeing any Data Lost events? If so these should enlighten us as to why this is happening, they get raised whenever a batch fails to write. I would make sure that event level is set to something other than Do No Log too.
-
Do you have any Alphanumeric points that would be saving very large strings of text? The NoSQL database has a limit to the size of each entry, which is large but its possible this could cause these symptoms.
-
-
I recommend changing the
point clean interval (ms)
and thestale data period (ms)
to60000
, which is at least currently the default. There is a decent chance this is actually causing the issue, but we're still investigating. -
@terrypacker said in Internal data source attribute 'Point values to be written' keeps climbing. Mitigation strategies?:
Are you seeing any Data Lost events?
I have not seen Data Lost events. However, since I've had to kill Mango to complete the last several restarts, if there were a data loss it would not have generated an event at that time.
If so these should enlighten us as to why this is happening, they get raised whenever a batch fails to write.
Thanks, that's reassuring. I had not remembered that there was such an event.
I would make sure that event level is set to something other than Do No Log too.
I had apparently set that event to critical. I did not see any such events, and in the last month I re-enabled critical events emails forwarding through my mobile phone network's email to SMS gateway. My phone sounds an attention getting submarine dive alarm when receiving Mango text messages.
Do you have any Alphanumeric points that would be saving very large strings of text?
Only one. I tried listing ALPHANUMERIC points in the table on the data_sources page, but for some reason it would not show a full query response (there were hourglasses at the bottom of the list). In any case, the JSONata command-line utility jfq enables me to specify more detailed reporting of the Mango configuration.
Query:
# Show name and loggingType and context update of all enabled alphanumeric points that are configured to log their values: $ jfq 'dataPoints[pointLocator.dataType="ALPHANUMERIC"][enabled=true][loggingType!="NONE"].[name,enabled,loggingType,pointLocator.context]' Mango-Configuration-Jul-17-2019_000500.json
I generate a large JSON structure to predict the tides based on a pressure sensor data. The table is displayed in a Mango 2.x graphical view via a server-side script graphical object that converts the JSON to an HTML table. The table is apparently using the Alphanumeric_Default template, which saves "When point value changes." However, it only updates context when the tide direction changes, which is only a handful of times per day. The only other alphanumeric point being logged has a context that triggers only twice a year.
The NoSQL database has a limit to the size of each entry, which is large but its possible this could cause these symptoms.
The
Point values to be written
rises by about 1,700 values per day, which far exceeds the number of times the tide changes direction in one day.I see there is now a Failed Login event. That's wonderful, thanks.
I just changed
the point clean interval (ms) and the stale data period (ms) to 60000
I will report back whether or not I see a change in the trend of the
Point values to be written
. It should take an hour or two to see a change.Thanks for your help.
-
Thanks very much: 3.5 hours after changing the TSDB settings, I see that the
Point values to be written
has leveled off, on average. Plotted on a one day scale, it will surely look like a flat horizontal line. Now it would be nice to figure out how to make it go down from 276,196 values to zero.My current configuration:
$ jfq 'systemSettings' Mango-Configuration-Jul-17-2019_201152.json | grep -i nosql "mangoNoSql.writeBehind.statusProviderPeriodMs": 5000, "mangoNoSql.writeBehind.maxInstances": 10, "mangoNoSql.backupHour": 4, "mangoNoSql.backupEnabled": true, "mangoNoSql.backupMinute": 0, "mangoNoSql.writeBehind.maxInsertsPerPoint": 10000, "mangoNoSql.backupPeriods": 1, "mangoNoSql.backupFileCount": 3, "mangoNoSql.backupIncremental": false, "mangoNoSql.backupFileLocation": "/mnt/WesternDigitalUSB/mango-backup", "mangoNoSql.corruptionScanThreadCount": 100, "systemEventAlarmLevel.NOSQL_DATA_LOST": "CRITICAL", "mangoNoSql.writeBehind.minInsertsPerPoint": 1000, "mangoNoSql.writeBehind.stalePointDataPeriod": 60000, "mangoNoSql.writeBehind.batchProcessPeriodMs": 500, "mangoNoSql.writeBehind.maxRowsPerInstance": 100000, "mangoNoSql.backupLastSuccessfulRun": "Jul-13-2019_040000", "mangoNoSql.writeBehind.backdateDelay": 4985, "action.noSqlBackup": "", "mangoNoSql.writeBehind.stalePointCleanInterval": 60000, "action.noSqlRestore": "", "mangoNoSql.writeBehind.minDataFlushIntervalMs": 100, "mangoNoSql.backupPeriodType": "WEEKS", "mangoNoSql.writeBehind.spawnThreshold": 100000, "mangoNoSql.intraShardPurge": false
-
Glad to hear it! Thanks for bringing this possibility to our attention, and with such detail along the way!
Alas the best way to zero it out would be restarting Mango.
-
@phildunlap said in Internal data source attribute 'Point values to be written' keeps climbing. Mitigation strategies?:
the best way to zero it out would be restarting Mango.
Ironically, restarting Mango will result in a prolonged data outage while the values are being written. Therefore it becomes a matter of choosing which data I want to lose (today's, or the accumulated values). I also wonder if those values are from a particular point, or from random points.
-
As predicted, the
Point values to be written
has continued to be fairly flat since yesterday's configuration change. It is fluctuating between 276,180 values and 276,230 values, except for a brief spike to 276,300 values. It is not trending up or down. I wonder what will happen next time Igenerate history
on a Metadata point. -
I would not expect a history generation to cause the issue again with the settings changed.
I see your point about the delay in shutdown. You can wait in the shutdown until it is reporting that it is waiting for batch write behind tasks to finish, and then use a
kill -9
to stop Mango, this should not cause issues. It is possible to reflect out the counter and manually set it to something, but I think it would be better to restart Mango.