Google Cloud Big Query module for accessing/storing data

phildunlap

Hi Fox,

Can you say more? It's already possible to do this either on the same machine (which I guess wouldn't enable updating the OS) or on multiple machines. If you have the same data point's table (same IDs) you can use the NoSQL merge tool to pull in the data pulled by the redundant unit any time it begins polling, of if not the same the "Mergration" tool is able to pull in the data from another Mango based on the XIDs of the data points in the databases matching.

So, you would have a no update event on the second Mango (whether on a received point or on something like an HTTP retriever polling the login page) that enables all the data sources in the redundant Mango, and disable them all in the inactive script. You could take it a step farther and trigger the primary Mango to merge or mergrate automatically.

I wouldn't worry about the second Mango requiring all the data all the time, if that's acceptable. If not, then you'll want to be persistent publishing, and you'd have to create either point links from all received persistent points to their normal points. or you could have a script transfer the data from the persistent points to the normal points and clear the history of the persistent points (that's probably how I would do it). Or attempt a fancier solution to have consistent data in the redundant points (solutions like having both Mangos actually share a network drive for the TSDB directory. You would probably want to be sure the primary Mango and the secondary Mango weren't writing at the same time, in that case).

MattFox

Sure Phil happy to!

The idea is that only one mango unit will be writing at any one time, i'll likely have some form of redundant network configured in order to switch between the two systems when I take one down.
I want to focus on using the mango as more as a control/data collection system and having the means for my customers to be able to access their data from a different server.
However I cannot do this if it's being stored in a nosql system that only mango can access. I have considered using cloudSQL as it would mean I could have two live systems reading the data but it does not scale well long term and I can envision my data tripling within the next four to six months.
By doing this, it takes the overhead away from mango for retrieving datasets and will allow me to segment my system a little better.
I've considered using a publisher but I already have a fair few going. That and I'm not sure how well it will work if I'm pushing the updates of nearly 3000 points...

phildunlap

Always a good plan. Although, one could argue it's overkill if you're automating someone's home greenhouse, for instance.
The API can enable you to get the values out. Querying the API is lighter than using the UI (since the UI is built on top of the API + registering for websockets). We have done tests with InfluxDB in the past and having that be a backing for the point values table, but we got better performance using our NoSQL database than theirs (in the test we were running, >300000 values per second, their DB would eventually bog down doing some kind of administration / compaction and the data fire hose would drown it) , and we have access to the source code, so it was better for us to chose our own NoSQL database. But, you may be able to make arguments for supporting another database backend. The NoSQL database has a sustained performance of wicked fast. What data tools would users be accessing from?
Always a good decision. You could have a cloud instance of Mango they access it through. That's definitely a good architecture
We have quite a few systems with a lot of points (more than 3000). If you tune the overlap window in the persistent publisher it's pretty dang efficient, and I wouldn't worry too much about it unless your system is already bogging down sometimes. But, this would be determined by the size of the machine running Mango. A big machine will handle 3000 points (I guess it also matters somewhat the points are) no problem.

It sounds like you're talking about three Mangoes to me. Two on site, with one being a redundancy if you're updating or the primary goes down, and one in the cloud. Then the one in the cloud needs to be able to serve data to your clients the way they want to consume it. So, what way would they want to consume it?

phildunlap

Rereading the title, I think I may have been missing something in my previous posts. I am familiarizing myself with Big Query a little better.

MattFox

@phildunlap said in Google Cloud Big Query module for accessing/storing data:

Always a good plan. Although, one could argue it's overkill if you're automating someone's home greenhouse, for instance

Without going into too much detail of the company model, let's just say we're in the agricultural sector in multiple areas.

@phildunlap said in Google Cloud Big Query module for accessing/storing data:

The API can enable you to get the values out. Querying the API is lighter than using the UI (since the UI is built on top of the API + registering for websockets).

This could be quite beneficial for me, I'd just have to work out what language/framework to use to provide what our customers need,..

@phildunlap said in Google Cloud Big Query module for accessing/storing data:

The NoSQL database has a sustained performance of wicked fast. What data tools would users be accessing from?

Good to know, just that since that users access their data direct from a web browser desktop/mobile it would be good to provide read access with minimal overhead. I am finding that there are issues with connecting to mangoUI via a rural internet connection. Trying to load the now condensed down mangoservice and mangoUI files generally throttles the connection and stalls it...

@phildunlap said in Google Cloud Big Query module for accessing/storing data:

We have quite a few systems with a lot of points (more than 3000). If you tune the overlap window in the persistent publisher it's pretty dang efficient, and I wouldn't worry too much about it unless your system is already bogging down sometimes.

Where can I read or can you email me a guide? Sometimes what's in the mango help and on the help.inifiniteautomation.com site varies a little. Would be good to get as much as I can out of my system. (Currently two cores with 15GB of memory)

@phildunlap said in Google Cloud Big Query module for accessing/storing data:

It sounds like you're talking about three Mangoes to me. Two on site, with one being a redundancy if you're updating or the primary goes down, and one in the cloud. Then the one in the cloud needs to be able to serve data to your clients the way they want to consume it. So, what way would they want to consume it?

Technically, one on site for the customers that want/need it. A server in google cloud and another based here.
I'm doing my darnedest to automated and streamline as much as I can as I'm a one man band. Hence my arguable over-concern for redundancy.

@phildunlap said in Google Cloud Big Query module for accessing/storing data:

Rereading the title, I think I may have been missing something in my previous posts. I am familiarizing myself with Big Query a little better.

Thanks, hugely appreciate you taking me seriously.

MattFox

Just thought I'd be difficult and bump this thread....
Sorry I know you're bogged down at the moment, but does it seem feasible to do? Alternatively, would it be something I could write based from your existing sql module database code???

Thanks again

Fox

phildunlap

Hi Fox,

It's probably not a feasible size chunk of work to accomplish in a forum thread. But, I spent a little time this morning sketching out a module that I think would be the right direction to travel in if one were to support this. We may be able to work through it together. Really only the BigQueryPointValueDao class would need extensive fleshing to get some testing done. I tried to take some of their code for authenticating and whatnot, but didn't attempt to run anything or get credentials. I would guess what would need to happen is getting authenticated and having that session connection available to the BigQueryPointValueDao so that it could do its queries. Odds are if it was going to be a real module you would want some kind of round-robin batching of writes like we do for the NoSQL module (but that code isn't open source currently) or a batch write behind task like the SQL.

I used the NoSQL architecture because that will allow you to still use H2 or MySQL as the database backend and swap in the NoSQL DAO (data access object) for the point values table (and technically some others, like the reportInstanceData table). I would think the PointValueDaoSQL in the core would be an okay reference for any SQL you would have to write for this DAO..

This code does not / has not been run and only serves educational purposes in its state, It's an eclipse project.
https://forum.infiniteautomation.com/assets/uploads/files/bigquery_8-21-18.zip

phildunlap

Also of note, I believe BigQuery has a JDBC driver that you would be able to use with an SQL data source if that could be made to suit your ambitions.

MattFox

Thanks Phil!

Really do appreciate this. Am going to be away for the rest of this week but if you are happy to discuss when you can I would be very grateful. I'll have a look at the code when I get back and will see what can be accomplished!

Fox

phildunlap

Certainly! There's a lot of meat missing from the bones!

but if you are happy to discuss when you can I would be very grateful.

Of course!