• Recent
    • Tags
    • Popular
    • Register
    • Login

    Please Note This forum exists for community support for the Mango product family and the Radix IoT Platform. Although Radix IoT employees participate in this forum from time to time, there is no guarantee of a response to anything posted here, nor can Radix IoT, LLC guarantee the accuracy of any information expressed or conveyed. Specific project questions from customers with active support contracts are asked to send requests to support@radixiot.com.

    Radix IoT Website Mango 3 Documentation Website Mango 4 Documentation Website Mango 5 Documentation Website

    NoSQL Task Queue Full

    User help
    3
    22
    3.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mihairosu
      last edited by mihairosu

      Yes the generation produced a lot of values, I'd say about half of it, before it stopped. Many test of thousands of values.

      So in this instance you think it would benefit to reduce the "Delay for backdate batches"?

      1 Reply Last reply Reply Quote 0
      • phildunlapP
        phildunlap
        last edited by phildunlap

        It sounds like you're in the many-values-few-points situation described. So, I would expect the larger, 60000 ms to fare better than the default 5000 ms. That you have reduced the time range will probably skirt the problem more significantly, though.

        1 Reply Last reply Reply Quote 0
        • M
          mihairosu
          last edited by

          You are correct, reducing those intervals does not create any issues. Ideally i would not sit down for hours, manually editing the intervals until I have all the data I want.

          Is there some way to throw more hardware at the problem, and change some settings to make more use of that hardware? As far as I can tell, it's some sort of software issue, since we are not running out of memory.

          Here's an htop during times when we've hit the Task Queue Full error:

          0_1514921823392_mango hardware stats.png

          1 Reply Last reply Reply Quote 0
          • phildunlapP
            phildunlap
            last edited by

            Yes, I have done that too, it's not ideal. Did you try the whole range with the modified settings?

            Hmm. Can you check if you have any hs_err files in your Mango/ directory that may reveal the OOM? Otherwise your log should have some indication of what's gone astray. And, if it doesn't, you should definitely keep an eye on stderr.

            The task queue message is probably incidental to the crash. If you change your spawn threshold / max instances you can see this (force it to have max instances all the time with a low spawn threshold, the messages will go away, but it will probably still crash).

            It may help to purge your event, audit and userEvent tables.

            1 Reply Last reply Reply Quote 0
            • M
              mihairosu
              last edited by

              Oh wow I DO have a bunch of he_error files...I didn't realize they'd be here. Perhaps they should go in the log folder?

              According to these logs, I am indeed running out of memory.

              I wonder if it could be that the memory spikes up only temporarily before being dropped. I was not staring at it the whole time, to be fair.

              Ok now we're getting somewhere!

              Thanks for that information.

              1 Reply Last reply Reply Quote 0
              • M
                mihairosu
                last edited by

                Ok I just noticed a problem is of my own making.

                I had created an ext-available script with a new JAVA heap size, which was even larger than the real memory I had allocated to the VM.

                I must have forgotten all this information and messed around with the memory allocation.

                I've increased both, and made sure I have enough real physical memory allocated.

                That could have been causing some major problems.

                1 Reply Last reply Reply Quote 0
                • M
                  mihairosu
                  last edited by

                  I'm checking out a current historical generation process, and I am not seeing memory usage increase above 4GB for the system, even though I've provided plenty.

                  Now it could be that the system just does not need that much, or maybe DISK IO is limiting database reads/writes.

                  In any case, I appreciate your insights, but I have another question:

                  Which NoSQL settings can I tweak to make sure more available memory can be used?

                  1 Reply Last reply Reply Quote 0
                  • phildunlapP
                    phildunlap
                    last edited by

                    Good catch! Definitely if the allocation was above where the OOM Killer would awaken this would happen, even if we never were really using that much memory at one time, because the garbage collector wouldn't attempt to free anything in an intense computation moment like history generation.

                    Hmm, I've never looked into / thought of moving the hs_err files to the logs directory. Those are created by the HotSpot Java runtime - and they only occur if things have gotten off the rails, such as OOM, SegFault, etc. I'll bring it up to people, but usually we tell people to look in their ma.log file anyway, so I wouldn't expect that direction to reveal the hs_err files either. The advantage of them being in the Mango/ directory is that they're abnormal events so it's nice to have an easy reference to them, without having to look past lots of other log files.

                    1 Reply Last reply Reply Quote 0
                    • M
                      mihairosu
                      last edited by

                      I am noticing one more interesting tidbit with htop.

                      I always see one virtual CPU pegged at 100% with none of the others really showing any work.

                      Perhaps this process is not running multi-threaded?

                      1 Reply Last reply Reply Quote 0
                      • phildunlapP
                        phildunlap
                        last edited by

                        If you are using the clocks in the interface, it'll run in a Jetty thread. You can run more than one at once, and it is multi-threaded in that sense. But, what you would find, is that after you get somewhere between 3 and 6 going you cannot start any more and your interface isn't especially responsive. You are correct in inferring that a meta point's history generation or a meta data source's history generation is only going to use one thread.

                        1 Reply Last reply Reply Quote 0
                        • P
                          Phillip Weeks
                          last edited by Phillip Weeks

                          @phildunlap said in NoSQL Task Queue Full:

                          The task queue message is probably incidental to the crash. If you change your spawn threshold / max instances you can see this (force it to have max instances all the time with a low spawn threshold, the messages will go away, but it will probably still crash).

                          Phil would you elaborate a little on the above comment, what is considered a low threshold?
                          Our situation is similar to the one discussed in this thread in that our systems have a limited number of points < 300 and very lengthy periods 2-5 years on every minute so about 100 million of values a year. I noticed your remark about streaming as opposed to loading the entire period's values in memory, Are there any plans to move in this direction? I have tried your tuning suggestions above and I believe you correctly suggested shortening the time period is the most obvious solution so I modified the run loop script you offered in the other thread so the the period gets divided-up and points processed repediavely and linearly keeping things under control with a RuntimeManager.sleep(5000) thrown in for good luck if the pointsTBW gets out of control. Our system was consistently running out of memory before I read this thread and now that I understand why a little better the result is crashes have been eliminated completely on metapoint regenerations. Combined with your tuning suggestions in this thread, the system works much better and we get hassle-free regeneration over a lengthy periods of a year or more.

                          I also remember something discussed in another thread about automatically regenerating the missing metapoints whenever the source data sync runs; Is there any motivation within your collective talents to create this feature?
                          Thanks in advance.

                          1 Reply Last reply Reply Quote 0
                          • phildunlapP
                            phildunlap
                            last edited by

                            Phil would you elaborate a little on the above comment, what is considered a low threshold?

                            In the context of that post, anything that would cause the maximum number of batch write behind instances to exist in regular operating conditions.

                            I noticed your remark about streaming as opposed to loading the entire period's values in memory, Are there any plans to move in this direction?

                            I can't say if anyone will want to change anything about it, but I began I REST controller that did streaming values (or optionally load them all), generating multiple points' histories in one request, and provided the ability to cancel a history generation task. There was some stuff left there to be finished / fixed up, but I'm sure some more feature-ful REST controller is on its way.

                            I also remember something discussed in another thread about automatically regenerating the missing metapoints whenever the source data sync runs; Is there any motivation within your collective talents to create this feature?

                            Possible, but unlikely. I don't think you should have meta points on the receiver end unless you really cannot have them on the publisher's side. You could already take that history generation loop I gave you (invokes the meta edit dwr from a loop of points to generate iirc) and place that into a set point event handler for the persistent receiver's sync completed event. So, in that sense it is possible and not my favorite idea, so I'm unlikely to encourage it further without good cause.

                            There some discussion of this in this thread: https://forum.infiniteautomation.com/topic/3189/using-persistent-tcp-points-in-script-calculations

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post