ma.log filling up with NoSQL Batch Writer Fatal Error
-
For tracking purposes, I wanted to let you know that we got another corrupt file a night or two ago. It again was from one of our Runtime scripts. I went through every one and clicked the validate button, and sure enough one complained about the file.
I've run out of ideas on what could be locking the file outside of Mango. I'll ask if our infrastructure team can do another sweep of the system to see if anything sticks out.
Thanks,
Chad -
We've now had 3 more files get corrupted, all are trying to be written to by our scripting datasources that hit the /data-source/status endpoint.
Is there a way to get this looked into more closely? Should I open a support ticket?
100% of these errors have been with datasources we use to hit the status endpoint.
ERROR 2022-08-30T17:12:49,279 [high-pool-2-thread-159790 --> Polling Data Source: datasource_name_here-runtime_status] - Uncaught Task Exception com.serotonin.ShouldNeverHappenException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\Databases\mangoTSDB\3\41918\773.data.rev (Access is denied) ERROR 2022-08-30T17:12:52,544 [high-pool-2-thread-159645 --> Polling Data Source: datasource_name_here-runtime_status] - Uncaught Task Exception com.serotonin.ShouldNeverHappenException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\Databases\mangoTSDB\54\33945\773.data.rev (Access is denied) ERROR 2022-08-30T17:12:56,294 [high-pool-2-thread-159759 --> Polling Data Source: datasource_name_here-runtime_status] - Uncaught Task Exception com.serotonin.ShouldNeverHappenException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\Databases\mangoTSDB\78\23157\773.data.rev (Access is denied)
Thanks,
Chad -
@cmusselm Push for a support ticket.
It's the only way to get ahold of them now. -
@MattFox You're probably right. I'll do that here soon. As of now, I'm trying to re-create the issue in our test environment. We have 150+ scripts that run every 6 seconds to get the runtime info for various data sources in prod.
Hopefully doing this in test will yield similar issues and can help with additional troubleshooting.
-
One thing we're doing that could be contributing to this is using .set inside a for loop when we have pulled multiple runtime records.
for (var i = noRec; i > 0; i--) { var pollDuration = content.latestPolls[successLen-i].duration/1000; var pollTime = content.latestPolls[successLen-i].startTime; varPollDuration.set(pollDuration, epoch(pollTime)); } //End for
We're going to replace this with an API call the /point-values to see if pushing the array of values and time to the data point will help resolve this.
I'll post an update when done and tested out.
-
@cmusselm talking from experience, if you start hammering the api, you can get dashboard connection issues.
Using the java calls is far better as it is more direct.
Perhaps try inserting a RuntimeManager.sleep() call instead?
Space out each write with ten milliseconds to give your system enough time between writes.
That and perhaps increase the cache size of those points to ten if you have not done so already.Fox
-
@MattFox Thanks for the insight. I'll definitely explore that and let you know what happens.
Chad
-
Quick update on my progress.
It took a little longer that desired, but we were finally able to get the corrupt files cleared out, and all of our Runtime DS Scripts updated with the sleep that @MattFox suggested, as well as caching 10 DPs for those points.
Hopefully all is good, and we don't see corrupt files any longer. I'll give it a month or so and post an update to hopefully close this out once and for all.
Thanks,
Chad -
@cmusselm Good Luck!
Fox
-
It's been well over a month and we haven't seen the issue come back. I think it's safe to say the problem is resolved.
Thanks @MattFox for the suggestions and being active in the forum! Your posts have helped me more than you know.
-
@cmusselm Thanks for letting me know!