Save
Saving
  • cmusselm cmusselm

    We installed Adopt OpenJDK v11.0.10.9 on our production environment last Thursday and for a couple days things looked good; CPU was steadily at or below 4%. On Saturday it jumped up to 16% and is holding there, which is the same behavior we saw with v14 of Adopt OpenJDK. Every few days or weeks the CPU jumps and stays there until it gets to 99% and we have to restart the app to get CPU back down.

    posted in User help read more
  • cmusselm cmusselm

    Thanks, Ralf. I'll see if/when we can switch to that and post an update.

    posted in User help read more
  • cmusselm cmusselm

    Unfortunately we're still seeing CPU increases after the updates above were made. The ma.log file is clean for the times that the CPU is increasing, so our updates to reduce the log size did work at least.

    Its always the Java process eating up the CPU. We're using OpenJDK14U-jdk_x64_windows_hotspot_14.0.1_7 We've been using this 6/15/2020. Any thoughts about the Java version and if we should look at using a different one?

    The CPU increases seem to happen when we are working to add new Modbus devices to Mango, configuring them to ensure communications are working as expected.

    Has any one else experienced anything like this, or is there something we may be doing wrong when getting new Modbus hardware online?

    Thanks,
    Chad

    posted in User help read more
  • cmusselm cmusselm

    Thanks, Craig, that explains some of the results I've seen.

    For the specific example I showed, the poll had a timeout of 4 seconds, and it was coming back in 3.2 seconds. Since there was no Modbus device actually connected to that IP, did Mango report the poll simply because it received TCP/IP info from the server?

    posted in User help read more
  • cmusselm cmusselm

    Hello,

    I have a question around Modbus communications, specifically the poll duration. While working on items in our test environment we enable a Modbus data source with the server’s home IP (127.0.0.1) and port of 502. The server has no Modbus device attached it to and no simulator or anything listening to port 502.

    1c7af5dc-ba9a-42cc-a447-5cc3d69e62c8-image.png

    As expected, the Slave Monitor is 0 since there is no Modbus device for Mango to talk to.

    What we get that is somewhat unexpected is successful polls. In the Runtime Status tab, the polls are succeeding, taking about 3.2 seconds each.

    c7f8bb68-6c98-4504-ae42-29667c13ba74-image.png

    We thought we’d see timeouts instead of successful polls. Are there any ideas why the polls aren’t timing out? Is the server sending TCP/IP info back to Mango, and that is triggering the successful poll?

    I did turn on I/O logging for the data source, but nothing gets logged outside of the data source being enabled and disabled.

    Thanks,
    Chad

    posted in User help read more
  • cmusselm cmusselm

    After doing more digging, here's what we have done to hopefully resolve this, or at least reduce the chance it happens again. We have made the updates, but have not rebooted production to have them take place. Once we do that, I'll let it run for a few weeks and hopefully remember to report back on how things are going.

    Thread Pools

    • We increased thread pools as originally suggested by Ralf

    API Calls

    • We could see that each <ma-get-point-value has its own API call, and the Watchlist made 2 API calls (1 to get the query string, and a 2nd to use the query string to pull all of the data points)
    • In our case, the Watchlist had grown to over 1,200 data points for various reason, and a majority of them did not need to appear on the UI.
    • The API call to get all of those data points from the Watchlist took ~6 seconds.
    • Because of this we refactored our UI to not use the Watchlist and only use <ma-get-point-value
    • Each one of these API calls ran for only ~100ms on average, and were performed asynchronously so overall it didn't take long to get the values

    WebSockets

    • Thanks to Craig's reply we know two different messages we've seen in the ma.log are related to websockets (log messages listed in the threads above)
      708a21bb-02f6-4291-a673-5d59ad9772fb-image.png
    • As seen in the screenshot, we had a websocket for both the Watchlist and our point-values.
    • Along the same lines as API calls, the elimination of using a Watchlist will eliminate unnecessary data point values from being pulled from the system
    • This way, if the websocket fails in the future, we won’t have to worry about 1000+ unneeded data point values failing and trying to write to the log

    Logging

    • In addition to the measures taken above, we have also made updates to not log warnings or info events in production for both org.eclipse.jetty and com.serotonin.m2m2
    • These are what wrote thousands of messages to the log file when the websocket failed
    • We believe it was writing to the log file that contributed to the CPU spike
    • With the log level increased to Error for these items, even if there is a websocket issue in the future, the info and warning messages should not be written and hopefully we won’t see an increase in CPU
      • Updated log4j2.xml file to have the <Loggers> level be error

    posted in User help read more
  • cmusselm cmusselm

    One other quick note I forgot to include...

    When I first tried this solution I was getting an error about a valid certificate path. I found this on Stack Overflow which resolve that issue: https://stackoverflow.com/questions/21076179/pkix-path-building-failed-and-unable-to-find-valid-certification-path-to-requ

    posted in User help read more
  • cmusselm cmusselm

    Thank you, Craig, and Ralf.

    That helps a lot to know that the messages are coming from the dashboard. We do have a couple views set up for the users to see widgets for the hardware with various data points displaying. This particular client has a good amount of devices in the field, so we're gathering quite a bit of info via <ma-get-point-value tags. We have 224 on one page and 483 on another. Seeing it typed out looks like a lot. Is that too many, or should I try to use a different method if needing more than x data points on a page?

    Quick question around Ralf's comment on disabling logging for a data source. I am not using the I/O log feature. Is there a way to make a data source not write data to the ma.log file?

    posted in User help read more
  • cmusselm cmusselm

    Thank you, Fox!!!! I got a proof of concept working and will be off and running now.

    For those that want a bit more info on what was done, here are the steps I took for this.

    1. Read Fox's post: https://forum.mango-os.com/topic/4834/how-to-get-json-data-via-api/2
    2. Read the help page on logging into Mango via the API: https://docs-v3.mango-os.com/mango-rest-api-authentication/

    In my case we were trying to hit the Mango API via a scripting data source within Mango. We used HttpBuilder as Fox indicated and gave a good outline of in his post.

    When determining the headers needed to send the request I mainly looked at the Mango help page linked to above.

    At first I tried to hit the runtime status endpoint directly and was getting unauthorized status back. This was because the resource hitting the API wasn't logged in. I then added the logic to log into Mango (following the first login example on the Mango help doc).

    Even after logging in via the API I still received 401 error when trying to hit the runtime status endpoint.

    After flailing around unsuccessfully for a day I decided to try the JWT method at the bottom of the help page. That worked perfectly, and easily! After setting up the JWT for a user, I was able to use the generated token and access the runtime status endpoint with no issues.

    posted in User help read more
  • cmusselm cmusselm

    Hello,

    I have a client that would like to see what % of the previous X polls were successful. I've tried many things, but can't get what he really wants. Hopefully I'm missing something obvious.

    We can look at the Runtime Status, or hit the /data-sources/status endpoint to get a snapshot of the last 10 successful and aborted polls. Any chance these are stored as a data point somewhere with the history?

    Here's what we've looked into so far:

    • I know there's the internal data points which include the Poll Success Percent, but that goes back to the time the data source was re-started. That won't give us a good picture of the polls over the past 1 or 2 minutes.

    • I've also looked into the "previous sequential successful polls" but that seems to have a bug. See this post I had on it for more details: https://forum.mango-os.com/topic/4948/question-on-internal-metric-previous-sequential-successful-polls/

    • Right now we're using the Slave Monitor 1/0 value to know the % of time over the past x seconds Mango has been able to communicate with the Modbus device, but that's not really what the client wants. That doesn't tell him if individual polls aborted.

    The only other solution I can think of is to hit the /data-sources/status API every x seconds, parse through the results and save them as data points.

    If we did that, is there a way to hit the Mango API from a Scripting Data Source? That would make it easier to schedule the job and save results into a data point. It doesn't look like jQuery or an XMLHttpRequest can be used in the scripting data source. If possible, I'd like to avoid custom jobs and code being written outside of Mango.

    Thanks,
    Chad

    posted in User help read more