• Recent
    • Tags
    • Popular
    • Register
    • Login

    Please Note This forum exists for community support for the Mango product family and the Radix IoT Platform. Although Radix IoT employees participate in this forum from time to time, there is no guarantee of a response to anything posted here, nor can Radix IoT, LLC guarantee the accuracy of any information expressed or conveyed. Specific project questions from customers with active support contracts are asked to send requests to support@radixiot.com.

    Radix IoT Website Mango 3 Documentation Website Mango 4 Documentation Website Mango 5 Documentation Website

    Potential Memory Leak

    User help
    4
    23
    9.8k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      adamlevy
      last edited by

      The issue persists. I am really not sure what to do here. I can't move forward with any other work while our server is failing every 40 minutes.

      I am willing to share some access to our instance of Mango or get on the phone to talk this through if that helps.

      This is Adam from iA3 BTW.

      1 Reply Last reply Reply Quote 0
      • A
        adamlevy
        last edited by

        I just realized that the memory-small.sh extension was enabled. I'm bumping it up to medium to see if that helps.

        1 Reply Last reply Reply Quote 0
        • terrypackerT
          terrypacker
          last edited by terrypacker

          Adam,

          I'd be a little weary of hitting the /v2/server/system-info endpoint frequently, some of the data returned is computationally intensive for Mango to calculate. For example it will compute the database size by recursively accessing every file to get its size. For NoSQL there will be 1 file for every 2 week period a data point has data.

          I would strip down the request to only get what you want:

          GET /rest/v2/server/system-info/noSqlPointValueDatabaseStatistics
          GET /rest/v2/server/system-info/loadAverage
          

          I would avoid requesting noSqlPointValueDatabaseSize because of the intensity of the request on the server.

          1 Reply Last reply Reply Quote 0
          • A
            adamlevy
            last edited by

            Thanks for the advice! I will reduce how frequently I hit that endpoint and make the query more specific.

            1 Reply Last reply Reply Quote 0
            • terrypackerT
              terrypacker
              last edited by terrypacker

              In addition to those metrics you can also request all of the information found on the InternalMetrics page via the /rest/v1/system-metrics/ and /rest/v1/system-metrics/{id} endpoints.

              The most useful of these for your current problem would be the id of com.serotonin.m2m2.db.dao.PointValueDao$BatchWriteBehind.ENTRIES_MONITOR which will show you how many values are currently waiting to be written to the database (cached in memory).

              This information can also be logged by the Internal Metrics data source.

              1 Reply Last reply Reply Quote 0
              • A
                adamlevy
                last edited by

                Awesome! I will check that out. Is there anyway to view the swagger interface for both v1 and v2 without restarting? or can the swagger interface only be enable for one version at a time?

                1 Reply Last reply Reply Quote 0
                • terrypackerT
                  terrypacker
                  last edited by terrypacker

                  To see both in swagger just set:

                  swagger.mangoApiVersion=v[12]
                  

                  You must restart to see the changes. Also Swagger isn't really designed for use in production environments especially if you are running thin on memory as it will eat up some of your precious ram.

                  1 Reply Last reply Reply Quote 0
                  • A
                    adamlevy
                    last edited by

                    Good to know. Thank you again. So far java hasn't run out of memory with the memory-medium ext-enabled but I'm also hitting 98% system memory usage and starting to use swap. But the response times are still OK.

                    I increased my query interval from 10s to 90s. I don't think this is the cause of the issue at all but it won't hurt to hit that endpoint less frequently. I will need to reconfigure telegraf to just grab the metrics I want.

                    The points waiting to be written are high but are still staying a tad lower than they were before. I think they'll be high as long as mango is catching up on historical point values for awhile. They are peaking around 10k whereas before they were hitting upwards of 15k.

                    I intend to disable swagger for production but I have been experimenting with it there as I was instrumenting Mango. Thanks for the reminder though.

                    1 Reply Last reply Reply Quote 0
                    • terrypackerT
                      terrypacker
                      last edited by

                      It definitely sounds like you don't have enough memory for your configuration. If you allocate the JVM too much memory you run the risk of having the process get killed by the OS.

                      If you intend to run with 4GB of system memory I would take a look at throttling the Persistent publishers via the setting on the receiving Mango. Phillip suggested setting it to 5 million but it seems like your system would run out of memory before there are 5 million values waiting to be written. I would keep an eye on that value and see when you start to experience GC thrashing (High CPU and OOM errors in the logs). Then set the throttle threshold to below that number of values waiting.

                      From the graph you posted you could set it to 10,000 (but that was with less memory so the value is going to be higher now).

                      1 Reply Last reply Reply Quote 0
                      • A
                        adamlevy
                        last edited by

                        Yeah I was just coming to this myself. I wanted to let it run and see how it handled it but I can see that I just need more memory. I'm bumping it up to a t2.large with 8G of memory. It was actually not crashing even though it was at 99% memory usage. But swap was increasing to 50% of the 2G of swap. We'll see how this performs now...

                        Thanks for your continued help with this.

                        1 Reply Last reply Reply Quote 0
                        • A
                          adamlevy
                          last edited by

                          So while increasing the size of the ec2 instance and switching to the memory-medium option has allowed us to catch up on the historical points, I am still noticing a significant memory leak. Here are graphs of our system stats over the past 5 days since I started running mango on a t2.large instance.
                          0_1520484562295_mango-stats-latest..png

                          We are now working with 8G and the memory-medium option tells java it can use 5G. I have been watching the memory usage steadily climb with periodic jumps once a day around the time when our persistent TCP data sync is scheduled and we get a surge of points.

                          Why does the memory consistently grow? This made the system unresponsive again for me at a critical time when I had to demo the system for a potential client. Are we doing something wrong here?

                          Thank you

                          Adam

                          1 Reply Last reply Reply Quote 0
                          • JoelHaggarJ
                            JoelHaggar
                            last edited by

                            What version of Mango is this? We released Mango 3.3 about a week ago that might improve this.

                            1 Reply Last reply Reply Quote 0
                            • A
                              adamlevy
                              last edited by

                              This is running core 3.3.1 ATM. I see that latest core is 3.3.3 and I will upgrade tomorrow.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post