Mango on AWS with HA architecture

ricardo

Hi,

Does anyone have experience implementing Mango on AWS with HA architecture? We are working with an enterprise client that wants Mango on AWS with managed services. They also have requirements for a HA architecture to minimize service downtime.

BR,
Ricardo

phildunlap

Hi Ricardo,

Can you be explicit about what High Availability means in the context?

Someone may have more to add, but Mango currently does not have the ability to distribute instances behind a load balancer that all appear to be once instance to the client. There are forms of redundancy that are possible, but it would take understanding the priority of the redundancy to provide an effective solution currently. One cannot easily make many Mangoes act as one. I believe it's on the roadmap, but it's a while away.

We have definitely experimented with using database backends that enable easy database clustering for data availability. At that point in development, we got the best results using our own NoSQL module on a single server with a bunch of memory.

ricardo

Hi Philip,

The client has a system service availability requirement below. Originally, I was hoping we can run multiple Mango instances behind a load balancer like you were thinking, but this is not available yet. Can we come up with a fail-over design which we can quickly fail-over to a passive instance when the primary Mango instance goes down?

System Service Availability

5.1 The service availability should meet 99.5% of the time, where "Service Availability" is defined as the total time in hours in a month for which the system can be connected and accessed, divided by the total service time in hours in a month (which is equal to the number of calendar days multiplied by 24 hours).

5.2 The maximum Down Time for the System per day should be 0.75 hour and the maximum Accumulated Down Time over a calendar month should be 3.65 hours where:
 "Down Time" is defined as the period for which the System CANNOT be accessed, other than due to maintenance and /or upgrade carried out
 "Accumulated Down Time" is defined as the accumulated number of hours of down time in a calendar month. This number will be reset to zero at the beginning of each calendar month

5.3 The maintenance window shall be agreed with the Employer after project commencement. Normally it shall be aligned with non-traffic hour of the Station.

ricardo

Guys,

Any idea? Do you think an active-passive fail-over architecture of Mango is possible?

BR,
Ricardo

phildunlap

I would say the closest we are to having a fail-over system right now would involve some development work to revive some of the NoSQL database integrations we've done in the past, which would generally lead to increased costs in either licensing the database engine or the increase in disk and processing power some of those database engines require. It also means we're not necessarily in control of solving every encountered issue (which we still aren't, but this was a large reason of why we kept improving out own NoSQL database).

It's completely possible for authenticated users to crash Mango by making too many and/or too large of requests. Give someone data source permission and there's even more potential for trouble.

That said, a single, well managed instance can achieve the metrics specified in your post of requirement 5. Thinking of your own deployments, do they already meet that standard?