The Startup Behavior of a 2-Node Pacemaker Cluster

This article describes fencing considerations, quorum settings, and commands for sensible behavior in a 2-node Pacemaker with Corosync cluster

Goals for a 2-node cluster:

Do not go online with stale data (replication case).
Do not cause a startup fencing loop.
Run Pacemaker primitive services exactly once (prevent IP conflicts, data corruption, and other issues).

To be allowed to start services, a node needs to be quorate, that is, the node needs to be a member of a cluster partition that has quorum. The node also must be certain that the respective Pacemaker primitive service is not (and cannot possibly be) running anywhere else.

Methodology:

Setting two_node: 1 within the quorum section of a Corosync configuration file enables 2-node cluster operations. Enabling this setting automatically enables the Corosync wait_for_all quorum option.

This is sensible behavior because quorum in a 2-node cluster is two nodes (50% of the votes + one). So on startup, a node will always wait for the other node, and only then become ready to provide services.

Pacemaker will then start, see both nodes in the membership, will probe for current service status, and try to change the state of the world using start, stop, or possibly other actions based on configured policy.

Fencing considerations in a 2-node cluster

If the 2-node cluster now loses a node, the other node will continue to run services, or take over services. In a 2-node cluster, this presents a special problem as the 2-node cluster does not have real quorum (a simple majority in a cluster with an odd number of three or more nodes). If the 2-node cluster lost a node because of a communication problem, both nodes are alive, but from either node, the other node will appear to be unresponsive, and so each node needs to fence the other node before taking any further action. After a successful fencing operation, services cannot possibly run on the fenced node.

In this scenario, typically one of the nodes won the race to fence the other node, and so the other node is rebooting, due to Pacemaker options that you would have configured as part of a typical setup. After the reboot, if communication is still down between the nodes and without the wait for all behavior, the node would, after some timeout perhaps, start Pacemaker. However, because communication between the nodes is down, the newly rebooted node would think that the other node is unresponsive, and try to fence the other node that you, as the omniscient global observer, know is happily running services. This pattern would repeat with changing roles, each time with the newly rebooted node fencing the other node. You might consider not using fencing for your 2-node cluster. However, without fencing, and used-to-be replicated data, but no communication, you would get diverging data sets.

Without fencing, and with shared data, you would get data corruption. With proper fencing configured on both the Pacemaker and DRBD® levels, you might get successful STONITH behavior, but then DRBD would still refuse to take over with only consistent or outdated data on one of the nodes.

With the implicit wait for all, Pacemaker will not start, and so the newly rebooted node will not become quorate until communication with the peer has been reestablished. This avoids the startup-and-fence-the-other-node repeating loop.

The remainder of this article repeats and rephrases the above information, with some subtleties and some additional commands that you might want to use in certain circumstances. If you have understood the startup behavior of a 2-node Pacemaker cluster from the article so far, you can stop reading here, or else continue reading to reinforce and deepen your understanding.

Reinforcing the concept by using different words

In Pacemaker with Corosync 2-node clusters, you should use the two_node: 1 quorum setting in your Corosync configuration file. Remember from earlier discussion that this Corosync configuration setting effects an implicit wait for all behavior for each node.

You should also consider that no-quorum-policy=stop is the default setting in Pacemaker if you have not configured it differently in your Pacemaker configuration. Although if you have configured no-quorum-policy=freeze, the behavior described in this article in a 2-node cluster will be the same as for no-quorum-policy=stop.

So after startup, and without communication to the other node, the newly rebooted node does NOT have quorum, and without quorum, it will not fence the other node or start anything, or do anything else, really, because of the stop (or freeze) no-quorum-policy setting. Once communication is reestablished between the two nodes and the newly rebooted node has seen its peer (and so become quorate), you then can lose the peer (and, because of the two_node Corosync quorum setting, keep quorum).

Should you ever need to bring up an isolated single node, you can then explicitly cancel the initial wait for all stage, at runtime with the following command:

corosync-cmapctl -s quorum.cancel_wait_for_all u8 1

Of course, before doing this, you should confirm that this is the right thing to do in your specific situation by following some documented administrative best practices procedure that you should have in place around your data or services.

But properly configured DRBD might still prevent your node from going online, if your node suspects that its peer might have better data. If you, as an omniscient global observer, know better, then you can use the drbdadm primary ~-~-force command to manually try to have the node go online with outdated or possibly stale data, or even with inconsistent, but hopefully just by a little bit data. (An fsck command is strongly recommended here!)

Reiterating the point

The key point is that in a 2-node cluster, as in this case, Corosync (the communication and membership layer of Pacemaker clusters) is configured for two_node quorum behavior, which implies quorum.wait_for_all, as in the corosync-cmapctl command above. That means that you can shut down one node from a 2-node cluster, keep the other node running, and that should work just fine.

However, if you then chose to stop that single node as well, and restart it as an isolated single node, the wait_for_all Corosync setting will block the node from starting services because it will be waiting for its peer.

This behavior is by design, and in general a good thing. If you actually mean to bring up that single isolated node, and you know it has good data, and you know the other node is down, then you can explicitly cancel the initial wait for all stage with this command:

corosync-cmapctl -s quorum.cancel_wait_for_all u8 1

Even more reiteration

The recommended Corosync quorum setting for a 2-node Pacemaker with Corosync cluster is two_node: 1, which automatically enables an implicit wait_for_all: 1 quorum setting.

The consequence is that you have to bring both nodes up, and in communication with each other, before the cluster will start services. You then can lose either node, so long as the other node keeps running.

If you have to boot a single, isolated node, and you know that this node has the most recent and good data, and you know that the other node is down, and will stay down, and you now want this node to start services, without bringing up the other node, you can cancel the wait, using the corosync-cmapctl command above.

If you disable the wait_for_all Corosync quorum setting, and set no-quorum-policy=ignore in your Pacemaker configuration, and get into a situation where fencing does work, but the cluster communication does not, then you might end up with two nodes repeatedly rebooting and fencing each other.

This is why this is NOT the default setup. This situation could be mitigated by not starting the cluster software on regular boot. But that would always require operator interaction after a reboot for any reason.

Created by MAT (based on original content by LE) - 2022-07-21

Reviewed by DJV 2022-07-25