What are some recommendable fencing/STONITH devices for Pacemaker?

We're often asked for recommendations on hardware that supports fencing/STONITH in Pacemaker. We try to avoid making specific recommendations since there are so many supported options, but we're aware of a few one-size-fits-all smart PDU options.

Often times hardware will ship with some mechanism that support fencing/STONITH, like SuperMicro's IPMI, Dell's iDRAC, and HPE's ILO. If your server hardware doesn't have something like this built-in, one option you have is using a smart PDU or UPS, and maybe you're lucky and already have one of those!

If you don't have one and need to purchase one, the APC AP7900B is one that LINBIT trusts and has seen used in 100's of Pacemaker clusters. https://www.apc.com/shop/us/en/products/Rack-PDU-Switched-1U-15A-100-120V-8-5-15/P-AP7900B

The instruction manual probably covers this, and should be read to ensure the firmware/details haven't changed, but the newer versions have been shipping with network features disabled by default for security reasons. To enable these features telnet/console in, run:

tcpip -i <ip-to-configure> -s <subnet> -g <gateway>
web -h enable
web -s enable
snmp -S enable -c1 private -a1 writeplus
snmp -S enable -c2 public -a2 writeplus
reboot -Y

Once rebooted, you should be able to reach the PDU over the network, and therefore can configure and use it in Pacemaker. Adding the PDU fencing devices requires distinct "off" and "on" actions for each outlet on each PDU. With two nodes, each with two PSUs, this translates to eight commands. The "off" commands will be monitored to alert us if the PDU fails for some reason. There is no reason to monitor the "on" actions. 

# Node 1 - off
pcs stonith create fence_node-a_pdu1_off fence_apc_snmp pcmk_host_list="node-a.linbit.com" ipaddr="<ip-pdu1>" delay="5" action="off" port="1" op monitor interval="60s"
pcs stonith create fence_node-a_pdu2_off fence_apc_snmp pcmk_host_list="node-a.linbit.com" ipaddr="<ip-pdu2>" delay="5" action="off" port="1" power_wait="5" op monitor interval="60s"
 
# Node 1 - on
pcs stonith create fence_node-a_pdu1_on fence_apc_snmp pcmk_host_list="node-a.linbit.com" ipaddr="<ip-pdu1>" action="on" port="1"
pcs stonith create fence_node-a_pdu2_on fence_apc_snmp pcmk_host_list="node-a.linbit.com" ipaddr="<ip-pdu2>" action="on" port="1"
 
# Node 2 - off
pcs stonith create fence_node-b_pdu1_off fence_apc_snmp pcmk_host_list="node-b.linbit.com" ipaddr="<ip-pdu1>" delay="5" action="off" port="2" op monitor interval="60s"
pcs stonith create fence_node-b_pdu2_off fence_apc_snmp pcmk_host_list="node-b.linbit.com" ipaddr="<ip-pdu2>" delay="5" action="off" port="2" power_wait="5" op monitor interval="60s"
 
# Node 2 - on
pcs stonith create fence_node-b_pdu1_on fence_apc_snmp pcmk_host_list="node-b.linbit.com" ipaddr="<ip-pdu1>" action="on" port="2"
pcs stonith create fence_node-b_pdu2_on fence_apc_snmp pcmk_host_list="node-b.linbit.com" ipaddr="<ip-pdu2>" action="on" port="2"

Obviously, if you don't have redundant power supplies in each host, you can skip the respective command in each section above. You should always, however, have more than one PDU as sharing a single PDU is adding a single point of failure to an otherwise shared nothing cluster.

MDK - 10/28/21