Please Note This forum exists for community support for the Mango product family and the Radix IoT Platform. Although Radix IoT employees participate in this forum from time to time, there is no guarantee of a response to anything posted here, nor can Radix IoT, LLC guarantee the accuracy of any information expressed or conveyed. Specific project questions from customers with active support contracts are asked to send requests to support@radixiot.com.

Radix IoT Website Mango 3 Documentation Website Mango 4 Documentation Website

Using PUT for data entry causes fatal error


  • Hi all, due to trying to insert points from third party systems I've found on two occasions now that a fatal error has occurred causing mango to crash and significant data loss.

    WARN  2019-11-26T16:26:33,622 (com.infiniteautomation.tsdb.impl.IasTsdbImpl.repairShard:1658) - Corruption in databse at /mnt/disks/sdb/mango/databases/mangoTSDB detected in series 10518 s$
    ERROR 2019-11-26T16:26:33,622 (com.infiniteautomation.tsdb.impl.CorruptionScanner.findCorruption:539) - java.io.FileNotFoundException: /mnt/disks/sdb/mango/databases/mangoTSDB/5/10518/733.$
    java.lang.RuntimeException: java.io.FileNotFoundException: /mnt/disks/sdb/mango/databases/mangoTSDB/5/10518/733.data.rev (Too many open files)
    FATAL 2019-11-26T23:33:11,477 (com.infiniteautomation.nosql.MangoNoSqlBatchWriteBehindManager$PointWrittenEntry.writeBatch:527) - Should never happen, data loss for unknown reason
    java.lang.RuntimeException: java.io.FileNotFoundException: /mnt/disks/sdb/mango/databases/mangoTSDB/65/8500/733.data.rev (Too many open files)
    

    What on earth can I do to prevent this? Note i'm using individual data point value Puts because I need the update event to fire to run other alerts and meta data points.


  • I'm getting the feeling that whomever may be able to me here is either incredibly busy or on holiday...


  • @MattFox you are correct we were busy with the release of Mango 3.7.x. I'm making some assumptions here but the "too many open files" problem can be caused by the ulimits being too low on a linux system. Each process is assigned a limit to the number of open files it can have and this will need to be increased on a Mango instance with a large number of data points. Since the database is sharded there will be multiple files open for each data point when reading and writing to the database.

    Take a look at this post on how to increase these limits:

    https://forum.infiniteautomation.com/topic/2624/mango-locking-up-after-2-8-4-update/2


  • Ok cool so it is this. I didn't want to touch this before getting a second opinion.


  • Just an update. Mango crashed again.
    Can't work with this. Let me know what I can submit to sort this. We can't handle this crashing every few days.

    Fox

    EDIT: Looks like the setting of the ulimit values may not have taken hold as I had hooed... Now we wait...


  • @MattFox I am fairly confident in the source of your problem, but if the ulimit change does not fix it you can submit a zip file of all your log files to our support email and I will review them in detail. Also you probably have *.hprof files that are generated when the JVM crashes, submit the latest one of those as a zip as well.


  • Thanks Terry, I hope so too, the soft limit was at 1024 yet the hard limit is 1048576. This greatly confused me.
    I've got 7800 points, though I do wonder if this means the cap gets hit really quick if I have 1000000 files open which are scattered across a few thousand points. I'm not overly familiar with the structure.of the NoSql DB.
    After reviewing my settings I've managed to get the soft limit at 65535 and maintained the same hard limit. Part of me wonders if that limit should be higher...

    Sorry to be difficult

    Fox


  • @mattfox

    Hi, Matt
    I am just curious about your hardware configuration in this case. Is it a HTS or Mango running on your own Linux box?

    Victor


  • Virtual dedicated cloud server,

    To test to be sure I've used prlimit to set mango's file open limits just to be certain of limits. I'm finding for some stupid reason when I logged in before, ulimit maxes were back at 4096. Wish it'd make up its stupid mind... I may have to restart mango to make it use the new limits i've set in /etc/security/limits.conf but if this continues to work with no issue after setting the process limit perhaps I'll be ok...

    Fox


  • Just crashed again, mango is still working in the background, got 11500 files open.
    Web service crashed, cannot view dashboard, cannot access mango from browser,
    API not talking either...

    Shoot!

    EDIT: Looks like the API keeps dropping in and out. CPU is high. Perhaps it's time to upgrade the machine's CPUs and memory.... Just managed to click back in to the dashboard...

    EDIT 2: Have upgraded CPU to 6 cores now. Things appear to be running much smoother now. Will monitor and wait/see how things transpire now.


  • curious for 7800 points how often are they read from the datasources, how often are they logged to disk, and how much RAM are you allowing for the JVM on the 6 core CPU?


  • 8g at the moment.
    I've allowed for more processes to run so mango doesn't just sit still chewing ram.
    I've yet to see it climb that high though. Higher CPU usage appears to be the culprit at the moment.


  • Just crashed again an hour ago.... -_-!
    Cannot see any .hprof files anywhere...

    I'm emailing logs now.

    Fox


  • @MattFox I assume this was handled via support?


  • Correct, I'll post my fixes here once I've implemented them if that's helpful


  • I've just had this error come up again. The file open limit is obscenely high for hard limits so I need to know what are the best settings for Mango to ensure it's not crashing with too many files open. I really could do with some advice for tuning the system. Things are not running as nice as they were with earlier versions...

    Fox


  • Have learned that systemd uses a different set of rules from the OS itself. Despite all of my settings, systemd still loaded mango with the default of 4096 max open files.
    Inside /etc/systemd/system/ I had to create the directory /etc/systemd/system/mango.service.d then inside that override.conf
    In override conf I had to add

    [Service]
    LimitNOFILE=65536
    

    So now here is hoping I stop getting these errors... In the meantime I used prlimit -p [pid] -n4096:65536 to keep things going.

    EDIT: still had to change to -n65536:65536 since even at the soft limit mango was throwing these errors...

    Fox