Understanding ES Disk Usage and Free Space

Phillip Weeks

What would cause this ES to loose free space like this?
0_1520712559055_6d7fc43b-c18d-455f-80ba-d8f6a666bf8b-image.png
4.0K Getting Started.txt
20K LICENSE
40K RELEASE-NOTES
319M backup
136K bin
1.6M boot
136K classes
3.6G databases
68K db
576K filestore
112K ftl
96M lib
8.0K licenseTypes.xml
7.3M logs
12K m2m2.license.xml
64K overrides
4.0K release.signed
217M web
8.3M work

phildunlap

Hi Phillip,

Given that your databases directory is 3.6GB I would guess something in that directory has grown. We should probably df -h that directory too.

If it's the mah2.h2.db file, the path of investigation / resolution would likely be in resolving which table has grown, it may be described some in this thread: https://forum.infiniteautomation.com/topic/2813/invalid-login-error where an SQL backup followed by deleting the database and restoring it shrank the H2 database significantly.

I do wonder which of the tables grew in your case (with the events, userEvents, or audit being the culprits in the past). If you want to share count(id) from those tables, that could be interesting. You would probably also want to maybe post some items from that table (events if userEvents) during that insertion period if you feel due-ly diligent.

If it's the mangoTSDB that is large, that is quite odd, since it looks like this disk usage occurred during the purge. The most likely candidate there would still be data insertion, whether from a runaway script or a sudden reconnection on the persistent data source.

Phillip Weeks

Hmm lots of food there for thought in that response Phil. My first inclination would be to try the backup and restore and see if anything shrinks. However this is the ES that ended up with its licence missing so I can't even speculate right now.
0_1520960153505_1c42a6d6-e728-40e7-a67a-0271e794dab0-image.png 4.2G 4.2G databases
344M 344M databases/mangoTSDB

phildunlap

I responded to your email about that. I bet purging those three tables, doing a backup, stop Mango, start on a new SQL database, and restore will shrink that database directory very significantly. But, I do wonder which table grew so much.

Phillip Weeks

I restored the backup and sure enough the disk use dropped down to 38% ..Databases @ 240M However during the first hour the ES reported another corruption issue and repaired a shard which seems to indicate a recurring physical issue does it not?

phildunlap

No. If your disk is full the streams that output to the shards probably only flushed a portion of their contents, leading the last sample in the shard to be incomplete. The database would detect this, and then it would corruption scan it while running as required. Now, if it says BufferOverflowException in the corruption error, then the cause is the same but you will have to use the env.properties setting db.nosql.runCorruptionOnStartupIfDirty=true and restart Mango to repair that particular shard. I imagine we'll handle that case automatically before too long, but it's somewhat uncommon.