For many years, the question of high availability had always circled the same old subject of replication – how do we replicate data across nodes? how do we replicate the configuration to stay unified across nodes? Is active-active truly better than active-passive? and most importantly, what happens beyond the two node scenario?

Since the inception of the Linux-HA project (and I do believe it’s been around for years now – over 15 years), it has been the pivotal tool for creating Linux based high-availability clusters. Heartbeat, Stonith and Mon will take care of floating the IP numbers and services across – no biggy there, making sure the data is consistent across the board, that’s something completely different. Recently, one of the better known Asterisk Commercial offerings had launched an Asterisk-HA solution – it’s been long due – it’s just a shame it’s a commercial offering without an Open Source derivative, after all, it is Open Source based (I hope).

However, being a high availability solution on one hand, doesn’t mean you are truly a clustered solution – it is an active-passive solution, with a major caveat (at least as I see it), that if your data sync fails for some reason, you end up with a split-brain issue – and your entire solution is now made moot. Don’t get me wrong here, I think that for now, the solution is the next best thing to sliced bread, simply because there is no other solution out there. However, the fact this is the only solution, doesn’t make it the right solution.

What does federating mean in this respect? it means that data doesn’t need to be replicated across the board, it is automatically trickled across the network, making sure all nodes in the network have clear visibility for it. If a node fails inside the cluster, client automatically redirect themselves to a new node, no need for floating IP numbers. Call routing is automatically determined upon request and are never preset for the entire platform. And most importantly, the amount of data traversed between the nodes is as minimal as possible, preventing excessive usage of network resources and I/O.

What would it mean to federate the configuration of a PBX system? first of all, make sure each unit is capable of working on its own, information should be trickled across the nodes via two methodologies: A multicast/broadcast mechanism (for local LAN connected nodes) and a Published/Subscriber relation (for externally connected nodes). When a change is made to any of the systems, that change is then replicated to all the systems. The configuration is never fully transmitted between nodes (apart from a new node joining the cluster). Routing decisions are dynamically made across the network, they are not predetermined or preconfigured. There is no need to keep the cluster nodes in perfect physical alignment, mixing hardware specifications should be considered the norm. External devices should be able to “speak” to the cluster, without being aware of its existence.

Once we achieve all of the above, we’ll truly get to a point where we’ve clustered Asterisk (or another open source project) the right way.