• Recent
    • Tags
    • Popular
    • Register
    • Login

    Please Note This forum exists for community support for the Mango product family and the Radix IoT Platform. Although Radix IoT employees participate in this forum from time to time, there is no guarantee of a response to anything posted here, nor can Radix IoT, LLC guarantee the accuracy of any information expressed or conveyed. Specific project questions from customers with active support contracts are asked to send requests to support@radixiot.com.

    Radix IoT Website Mango 3 Documentation Website Mango 4 Documentation Website Mango 5 Documentation Website

    Mango locking up after 2.8.4 update

    User help
    2
    8
    2.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mattonfarm
      last edited by

      Hi all,
      I am having issues with the latest update.
      It seems like after a random amount of time (usually around 30mins) Mango gets stuck in some sort of loop, consuming 100% of the CPU and not allowing HTTP access.
      Attached is a screenshot of the Thread Monitoring page showing the runaway thread.

      0_1483665384710_thread monitoring.jpg

      This corresponds to what I see in top

      0_1483665463923_top.jpg

      I'm getting a number of these errors in ma.log which may be the cause of the lockup or may be a side effect of it.

      ERROR 2017-01-06 14:17:39,989 (com.infiniteautomation.datafilesource.rt.DataFileDataSourceRT.doPoll:269) -
      java.lang.NullPointerException
      at com.infiniteautomation.datafilesource.rt.DataFileDataSourceRT.loadNewFiles(DataFileDataSourceRT.java:182)
      at com.infiniteautomation.datafilesource.rt.DataFileDataSourceRT.doPoll(DataFileDataSourceRT.java:259)
      at com.infiniteautomation.datafilesource.rt.DataFileDataSourceRT.doPollNoSync(DataFileDataSourceRT.java:250)
      at com.serotonin.m2m2.rt.dataSource.PollingDataSource.scheduleTimeout(PollingDataSource.java:134)
      at com.serotonin.m2m2.util.timeout.TimeoutTask.run(TimeoutTask.java:69)
      at com.serotonin.timer.TimerTask.runTask(TimerTask.java:148)
      at com.serotonin.timer.OrderedTimerTaskWorker.run(OrderedTimerTaskWorker.java:29)
      at com.serotonin.timer.OrderedThreadPoolExecutor$OrderedTask.run(OrderedThreadPoolExecutor.java:278)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Logs attached...0_1483665559469_ma.log

      I've performed a clean install leaving only the databases and db folders behind and am still having issues.
      I am running a Persistent TCP data source to sync with a remote MangoES.

      Rebooting Mango seems to sort things out for 30 mins or so.

      Cheers,
      Matt.

      1 Reply Last reply Reply Quote 0
      • phildunlapP
        phildunlap
        last edited by

        Hi Matt,

        I think the error your posted is probably a symptom of the problem. It would be possible to check for this condition in the code (and we probably ought to, so thank you for bringing it to our attention) but it's unlikely it's the issue.

        The more interesting message to me in that log is that you've got too many open files. I wonder what all the open files are, and what the limit is. These command will assume only one Java process exists. If you have more on your server, you could do ps $(pidof java) | grep overrides and you will probably find the pid for Mango.

        To get an output of open files:

        lsof -p $(pidof java) > ~/lsof-output
        #then to count the files
        wc -l ~/lsof-output
        

        To get the limit for the user,
        ulimit -Hn

        To set the limit higher if necessary,

        #you'll first have to modify /etc/security/limits.conf most likely, create/modify the "user hard nofile 65535" line
        #then you can create a script for Mango/bin/ext-enabled/ which does:
        ulimit -Hn 65535
        #You will need a new SSH session for that limit to get applied.
        

        Another way to see the limits applying to a pid,

        cd /proc/$(pidof java)/
        cat limits
        #Most interesting line: Max open files 4096 4096 files
        
        1 Reply Last reply Reply Quote 0
        • phildunlapP
          phildunlap
          last edited by

          My expectation is ulimit -Hn will probably say 4096, and ulimit -Sn may only give you 1024. You are probably using the NoSQL database and have ~500+ points and are perhaps hitting this limit during startup. You may be able to simply add the ext-enabled script ulimit -Hn 4096 and you could avoid doing anything in /etc/security/limits.conf

          1 Reply Last reply Reply Quote 0
          • M
            mattonfarm
            last edited by

            Hi Phil,
            I think you're right about the error being a symptom of something else.
            Looking at the data I noticed that one of the persistent data points showed data up until the date the Mango server started to crash. Data on the MangoES was still being recorded. I'm guessing something had caused corruption in the database and when a historical sync was started it would get hung up in this data point as clearing all data point data and doing a complete re-sync with the MangoES on site seems to have solved things.

            I'm guessing the file limit issues us due to too many hung up historical sync threads running. I'll look into the user limits though.

            I'd still like a way of being able to work out if this happens again and how to fix it, especially if Mango is unable to start at all.

            Many thanks for your support.
            Matt.

            1 Reply Last reply Reply Quote 0
            • phildunlapP
              phildunlap
              last edited by

              Hi Matt,

              I can't say for sure if it's related or not, but I have placed a new version of the NoSQL module into the store. I was guided to the change I made by investigating a description of your events, though, so perhaps it is related, and it could conceivably produce symptoms like what you describe (too many open files, apparent corruption) in unfortunate circumstances. I'd encourage you to update!

              M 1 Reply Last reply Reply Quote 0
              • M
                mattonfarm @phildunlap
                last edited by

                @phildunlap Thanks Phil. I'll do the update and let you know how I get on.
                Interestingly I seem to be getting the following error repeating over and over, sometimes only 10 or so seconds apart.

                High priority task: com.serotonin.m2m2.persistent.ds.PersistentDataSourceRT$StatusProvider was rejected because it is already running.

                0_1484857197402_upload-c7a76cda-5e97-4705-b124-01818c5ed487

                This is on the data source end or the persistent data source publisher. Could this be an issue of corruption?

                1 Reply Last reply Reply Quote 0
                • phildunlapP
                  phildunlap
                  last edited by

                  Hi Matt,

                  I actually just was realizing there was a small error in what I made available. I'm coding tests for the fix right now. I will have another module in the store before the end of my day.

                  I wouldn't typically worry about that Status provider getting rejected right now. It could be a symptom. It is happening a lot, though. After I update the module, we can check if that event is happening a lot again.

                  1 Reply Last reply Reply Quote 0
                  • phildunlapP
                    phildunlap
                    last edited by phildunlap

                    Hi Matt,

                    I have made 1.3.4 of the NoSQL database available. 1.3.3 should be abandoned quickly (the one I made available two days ago, for anyone who may have updated to it).
                    Thanks!

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post