I'm looking into reading over 2,000 new datapoints, each reporting once per minute, and storing them either in Mango or in another time-series database. Most values are 16 bits wide, but many may be only 8 bits wide, and all could be stored with the same timestamp each minute. This is a lot of data, so I'm looking to store it efficiently, and then probably analyze and visualize it with the Anaconda Python toolset.
I currently have almost 1,300 enabled points in a Mango installation. I recommend the command line tool
jfq to use
JSONata queries to quickly answer such questions:
$ jfq '$count(dataPoints[enabled=true].[name])' Mango-Configuration-Aug-01-2019_000500.json
.[name] is not needed, but I like to list the point names as a sanity check before invoking the count.
This brings me to Mango's NoSQL time-series database:
- How does Mango's time-series database store binary or numerical values?
- Are all point values stored as 64 bits? Even binary values?
- Is each TSDB entry 128 bits? (64bit timestamp+64bit value)?
Each four 16-bit readings would be transmitted packed into 64-bit wide words, with the same timestamp from each data source, either over MQTT-SN into a broker and into the Mango MQTT client subscriptions, or over CoAP and REST PUTs. Those four stored numbers can then be used to generate the additional two numbers I need for each timestamp; incidentally, if you ever integrate an MQTT broker into Mango, please be sure it supports MQTT-SN like the Mosquitto broker. Values can therefore be calculated upon retrieval; storing such values would be redundant and bloat storage needs by another 50%.
To summarize, I could pack four 16-bit values into a 64-bit pointvalue, and then generate the remaining two points when I read and unpack the pointvalue, thus generating my desired six 16-bit values. Unpacking could almost be done with metadata functions, but metadata functions are triggered when the data enters Mango rather than when the data is retrieved from Mango's TSDB.
- Have you considered implementing ad-hoc metadata functions, where the data is generated when it is retrieved rather than when it is stored? This would reduce the need to store data that can simply be calculated from other stored points. Depending on the Ad-hoc metadata
Logging properties, calculated points could either be stored or recalculated each time they're retrieved. Metadata point value calculations are performed on retrieved stored data now by clicking on the metadata point source
Generate historyfunction, but they're not ad-hoc in that they are not automatically generated when the metadata point is queried for data that is does not have.
The most efficient storage in my case would be where each 2,000 element row begins with a single timestamp, and each column is a different data value. Since the timestamp will be the same for all the points, that timestamp could be stored only once. Rare missing values could be stored with a special value like a NaN. This avoids the need to write 2,000 identical timestamps to disk. Calculating the additional 2/4 points upon value retrieval from disk would further reduce the data storage demands by 1/3.
- Is there any way to handle point arrays in Mango, where each array index is a different point? e.g. voltage, voltage, ... voltage?
- As the data ages, it could be down-sampled (decimated) from 1-minute intervals to 10-minute intervals. Most time-series databases support automatic downsampling. Are there any plans to add automated downsampling to Mango? Purging should not be the only option for old point data.
I've considered using influxDB. Although they have downsampling policies, it seems that they do not support arrays; the closest data element they support is a 64-bit number. Consequently, I would still have to pack and unpack my 16-bit values. One consequence of having to pack and unpack values is that it makes it unlikely that the visualization tools would render the stored data directly. This forces me to choose between direct visualization and storage efficiency.
Thank you for your input.