• Recent
    • Tags
    • Popular
    • Register
    • Login

    Please Note This forum exists for community support for the Mango product family and the Radix IoT Platform. Although Radix IoT employees participate in this forum from time to time, there is no guarantee of a response to anything posted here, nor can Radix IoT, LLC guarantee the accuracy of any information expressed or conveyed. Specific project questions from customers with active support contracts are asked to send requests to support@radixiot.com.

    Radix IoT Website Mango 4 Documentation Website Mango 5 Documentation Website Radix IoT LinkedIn

    ma.log filling up with NoSQL Batch Writer Fatal Error

    Scheduled Pinned Locked Moved User help
    23 Posts 4 Posters 21.1k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • cmusselmC Offline
      cmusselm @terrypacker
      last edited by

      @terrypacker I'll check again with our infrastructure team to ensure that all antivirus software has the DB directories excluded, and that no other software could be hitting them.

      For the internal data points, I have posted a couple questions about them in 2020, but based on the answers, they don't work the way we would need for our client.
      https://forum.mango-os.com/topic/4925/poll-success-percentage-over-what-timeframe

      https://forum.mango-os.com/topic/4948/question-on-internal-metric-previous-sequential-successful-polls

      I wouldn't think that this endpoint is causing the issue, but it seems odd that every time this happens, a Data Source calling that endpoint is generating the error. Do you know where the Data Source Runtime Status data is kept? Is it somewhere in the NoSQL file structure, and is there a way that we could tell Mango to keep more than the previous 10 data point values? If we could get it to store the previous 2 - 4 weeks, then we could probably use the status endpoint and not need a new data source to store its data for longer periods.

      Thanks,
      Chad

      terrypackerT 1 Reply Last reply Reply Quote 0
      • terrypackerT Offline
        terrypacker @cmusselm
        last edited by

        @cmusselm

        Do you know where the Data Source Runtime Status data is kept?

        It's only in memory (cached on the running data source).

        and is there a way that we could tell Mango to keep more than the previous 10 data point values?

        This is hard coded to keep the last 10 poll statistics only.

        Just to be sure I read through the status endpoint code and I can't see how it would affect a shard. So either this is an impressive coincidence or we are missing something.

        1 Reply Last reply Reply Quote 0
        • cmusselmC Offline
          cmusselm
          last edited by

          For tracking purposes, I wanted to let you know that we got another corrupt file a night or two ago. It again was from one of our Runtime scripts. I went through every one and clicked the validate button, and sure enough one complained about the file.

          I've run out of ideas on what could be locking the file outside of Mango. I'll ask if our infrastructure team can do another sweep of the system to see if anything sticks out.

          Thanks,
          Chad

          1 Reply Last reply Reply Quote 0
          • cmusselmC Offline
            cmusselm
            last edited by

            We've now had 3 more files get corrupted, all are trying to be written to by our scripting datasources that hit the /data-source/status endpoint.

            Is there a way to get this looked into more closely? Should I open a support ticket?

            100% of these errors have been with datasources we use to hit the status endpoint.

            ERROR 2022-08-30T17:12:49,279 [high-pool-2-thread-159790 --> Polling Data Source: datasource_name_here-runtime_status] - Uncaught Task Exception 
            com.serotonin.ShouldNeverHappenException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\Databases\mangoTSDB\3\41918\773.data.rev (Access is denied)
            
            ERROR 2022-08-30T17:12:52,544 [high-pool-2-thread-159645 --> Polling Data Source: datasource_name_here-runtime_status] - Uncaught Task Exception 
            com.serotonin.ShouldNeverHappenException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\Databases\mangoTSDB\54\33945\773.data.rev (Access is denied)
            
            ERROR 2022-08-30T17:12:56,294 [high-pool-2-thread-159759 --> Polling Data Source: datasource_name_here-runtime_status] - Uncaught Task Exception 
            com.serotonin.ShouldNeverHappenException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\Databases\mangoTSDB\78\23157\773.data.rev (Access is denied)
            

            Thanks,
            Chad

            MattFoxM 1 Reply Last reply Reply Quote 0
            • MattFoxM Offline
              MattFox @cmusselm
              last edited by

              @cmusselm Push for a support ticket.
              It's the only way to get ahold of them now.

              Do not follow where the path may lead; go instead where there is no path.
              And leave a trail - Muriel Strode

              1 Reply Last reply Reply Quote 0
              • cmusselmC Offline
                cmusselm
                last edited by

                @MattFox You're probably right. I'll do that here soon. As of now, I'm trying to re-create the issue in our test environment. We have 150+ scripts that run every 6 seconds to get the runtime info for various data sources in prod.

                Hopefully doing this in test will yield similar issues and can help with additional troubleshooting.

                1 Reply Last reply Reply Quote 0
                • cmusselmC Offline
                  cmusselm
                  last edited by

                  One thing we're doing that could be contributing to this is using .set inside a for loop when we have pulled multiple runtime records.

                          for (var i = noRec; i > 0; i--) {
                              var pollDuration = content.latestPolls[successLen-i].duration/1000;
                              var pollTime = content.latestPolls[successLen-i].startTime;
                              varPollDuration.set(pollDuration, epoch(pollTime));
                          } //End for
                  

                  We're going to replace this with an API call the /point-values to see if pushing the array of values and time to the data point will help resolve this.

                  I'll post an update when done and tested out.

                  MattFoxM 1 Reply Last reply Reply Quote 0
                  • MattFoxM Offline
                    MattFox @cmusselm
                    last edited by

                    @cmusselm talking from experience, if you start hammering the api, you can get dashboard connection issues.
                    Using the java calls is far better as it is more direct.
                    Perhaps try inserting a RuntimeManager.sleep() call instead?
                    Space out each write with ten milliseconds to give your system enough time between writes.
                    That and perhaps increase the cache size of those points to ten if you have not done so already.

                    Fox

                    Do not follow where the path may lead; go instead where there is no path.
                    And leave a trail - Muriel Strode

                    1 Reply Last reply Reply Quote 0
                    • cmusselmC Offline
                      cmusselm
                      last edited by

                      @MattFox Thanks for the insight. I'll definitely explore that and let you know what happens.

                      Chad

                      1 Reply Last reply Reply Quote 0
                      • cmusselmC Offline
                        cmusselm
                        last edited by

                        Quick update on my progress.

                        It took a little longer that desired, but we were finally able to get the corrupt files cleared out, and all of our Runtime DS Scripts updated with the sleep that @MattFox suggested, as well as caching 10 DPs for those points.

                        Hopefully all is good, and we don't see corrupt files any longer. I'll give it a month or so and post an update to hopefully close this out once and for all.

                        Thanks,
                        Chad

                        MattFoxM 1 Reply Last reply Reply Quote 0
                        • MattFoxM Offline
                          MattFox @cmusselm
                          last edited by

                          @cmusselm Good Luck!

                          Fox

                          Do not follow where the path may lead; go instead where there is no path.
                          And leave a trail - Muriel Strode

                          1 Reply Last reply Reply Quote 0
                          • cmusselmC Offline
                            cmusselm
                            last edited by

                            It's been well over a month and we haven't seen the issue come back. I think it's safe to say the problem is resolved.

                            Thanks @MattFox for the suggestions and being active in the forum! Your posts have helped me more than you know.

                            MattFoxM 1 Reply Last reply Reply Quote 1
                            • MattFoxM Offline
                              MattFox @cmusselm
                              last edited by MattFox

                              @cmusselm Thanks for letting me know!

                              Do not follow where the path may lead; go instead where there is no path.
                              And leave a trail - Muriel Strode

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post