Configuring the DRBD Reactor Promoter Plug-in's Freeze Feature

This article describes how to configure your environment, DRBD options, and promoter plug-in configuration file to freeze HA resources that may take a long time to start up.

You can use DRBD Reactor and its promoter plug-in to manage applications and services and make them highly available. Should a cluster node hosting your application fail or lose its connection to the other cluster nodes, DRBD Reactor will start the application service on another node.

The promoter plug-in's freeze feature can be useful in cases where a service, for example, a large database, in the promoter plug-in's start list of services might take a long time to start up. If a node currently hosting the database resource loses its connection to the cluster, DRBD Reactor will freeze the resource and attempt to start the resource on another node. If in the meantime the original node regains its connection, DRBD Reactor will unfreeze the resource and the node will again host the resource. This could mean for a brief network connection drop, your high-availability (HA) resource is back up and running in seconds rather than minutes.

More information about DRBD Reactor and its plug-ins can be found in the DRBD User's Guide. You can also get help through DRBD Reactor's various man pages, `--help`, and at the DRBD Reactor GitHub page.

Fulfilling the Promoter Plug-in's Requirements

Before you configure DRBD Reactor and its promoter plug-in's freeze feature, you will need to first verify and fulfill some requirements.

Verifying cgroups v2

The promoter plug-in's freeze feature requires cgroups v2. On newer Linux distributions,   
such as RHEL 9 and Ubuntu 22.04, cgroups v2 is enabled. On older versions, you may have to      
manually enable it. 

Verify that your nodes have cgroups v2:

# ls /sys/fs/cgroup/cgroup.controllers

If this file is not present, cgroups v2 is either disabled or your Linux version does not support cgroups v2.

Enabling cgroups v2

You can verify that you can enable cgroups v2 on your system by entering the command `grep cgroup2 /proc/filesystems`. If the output includes `cgroup2`, then you can proceed to enable the feature, by using a kernel command line argument.

On RHEL-based systems, you can install the `grubby` package to make this easy.

On RHEL:

# dnf -y install grubby

Next, enter the following commands to add the kernel argument and update GRUB's configuration:

# grubby --update-kernel=ALL --args=systemd.unified_cgroup_hierarchy=1
# grub2-mkconfig -o /boot/grub2/grub.cfg

In Ubuntu, you will need edit the `/etc/default/grub` file, and add the following line to the
appropriate kernel entry block for your system. Unless you have multiple operating systems or
kernels that you manage with GRUB, this should be the block that starts with the
`GRUB_DEFAULT=0` line.

GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"

After adding the line to the `/etc/default/grub` file, you will need to enter the following
command to apply your changes:

# update-grub

NOTE: You can safely ignore any "Device or resource busy" warnings related to `device-mapper`
and `osprober`. If you do not want to see these messages, you can add the
`GRUB_DISABLE_OS_PROBER=true` line to your `/etc/default/grub` file to disable the operating system prober from searching for other operating systems. You likely do not need this feature unless you need to sometimes boot multiple operating systems.

Repeat these steps on all your nodes, then reboot your nodes and verify that cgroups v2 is
enabled:

# ls /sys/fs/cgroup/cgroup.controllers

Configuring Required DRBD Options

The DRBD Reactor promoter plug-in's freeze feature also requires the following DRBD properties set:

  • `on-no-quorum` set to `suspend-io`;
  • `on-no-data-accessible` set to `suspend-io`;
  • `on-suspended-primary` set to `force-secondary`;
  • and the DRBD `net` property `rr-conflict` set to `retry-connect`.

Use LINSTOR to set these properties on your LINSTOR resource by entering the following
commands on your LINSTOR controller node:

# linstor resource-definition drbd-options --on-no-quorum suspend-io <resource-def-name>
# linstor resource-definition drbd-options --on-no-data-accessible suspend-io <resource-def-name>
# linstor resource-definition drbd-options --on-suspended-primary force-secondary <resource-def-name>
# linstor resource-definition drbd-options --rr-conflict retry-connect <resource-def-name>

You can verify that your resource has these DRBD properties set by entering the command:

# linstor resource-definition list-properties <resource-def-name>

Configuring DRBD Reactor's Promoter Plug-in

Because DRBD Reactor's promoter plug-in will be controlling your HA resource, disable the DRBD `auto-promote` property on your resource by using the following LINSTOR command:

# linstor resource-definition drbd-options --auto-promote no <resource-def-name>

Your resource also needs the DRBD `quorum` property set to `majority` but LINSTOR should have set this automatically when you spawned your resource.

You can verify your resource's properties by using the `linstor resource-definition list-properties <resource-def-name>` command.

Settings Within the Promoter Plug-in's Snippet File

To enable the promoter plug-in's freeze feature, add the following lines to the promoter plug-in's TOML snippet file, located by default in `/etc/drbd-reactor.d/`, for your HA resource:

on-drbd-demote-failure = "reboot"
on-quorum-loss = "freeze"

Created 2022/11/30 - MAT

Reviewed 2022/12/1 - MDK