Troubleshooting DRBD Reactor Promoter Plugin Services

If you have configured a DRBD Reactor promoter plugin resource to start services and you are having issues with the resource not coming up, you can take a few troubleshooting steps.

Verifying Promoter Plugin Resource Configuration

If you have a DRBD Reactor promoter plugin resource that is not coming up, a first step is to verify that the resource configuration snippet file is a valid TOML file. You can do this by using an online TOML validator such as this one or else a local tool, such as this one. Next, verify that the configuration file is the same on all your cluster nodes and that it does not have a .toml.disabled extension.

You can enter ls /etc/drbd-reactor.d/ to show the contents of the default DRBD Reactor configuration snippet file directory. Alternatively, if you enter drbd-reactorctl status <promoter-plugin-resource-name> and the output shows something similar to the following, that could also indicate a disabled resource.

/etc/drbd-reactor.d/<promoter-plugin-resource-name>.toml: Error: Could not read config snippets Caused by: No such file or directory (os error 2)

If the resource is disabled, you can enable it by entering drbd-reactorctl enable <promoter-plugin-resource-name>.

If after validating and enabling your promoter plugin resource, it still will not come up, there are a few more troubleshooting steps that you can take.

Troubleshooting a Promoter Plugin Resource Not Starting

First, it is important to know that a DRBD Reactor promoter plugin resource is an “all-or-nothing” affair as far as the systemd services that the resource manages go. That is, if one of the plugin resource’s managed services does not come up (start) then none of them will. Even if the problematic service is later in the promoter plugin resource’s start list of services, services that are earlier in the list will not stay started and the promoter plugin resource will not come up.

Preparing DRBD Reactor Nodes for Troubleshooting

To prepare your DRBD Reactor nodes for troubleshooting, first disable the promoter plugin resource on all nodes except for the node that you want to troubleshoot. This way, no other node’s DRBD® state for the promoter plugin resource can affect the node that you will be working on.

# drbd-reactorctl disable --now <promoter-plugin-resource-name>

Showing More Information About a Promoter Plugin Resource’s Managed Services

To show more information about a promoter plugin resource’s managed services, you can enter a drbd-reactorctl status --verbose <promoter-plugin-resource-name> command.

You can use this command to show both promoter plugin resource state and systemd service state information for the resource’s managed (start=) services.

Disabling the Promoter Plugin Resource on The Troubleshooting Node

On the node that you are troubleshooting, also disable the DRBD Reactor promoter plugin resource so that you can investigate its services in a granular manner.

# drbd-reactorctl disable --now <promoter-plugin-resource-name>

Isolating Problematic Services By Using the DRBD Reactor Control Tool

One way that you can isolate problematic services in your promoter plugin resource’s start list is by using the drbd-reactorctl tool’s start-until command. By using this command, DRBD Reactor will start services in the resource’s start list of services up to, and including, a service that you specify with the command. This way, you get around the “all-or-nothing” aspect of the resource starting services.

📝 NOTE: You can either specify a service with the start-until command by its name or its index number. The index number is an integer value that represents a service’s place in the promoter plugin resource’s start list, where the first service in the list has an index number of 0, the next service has an index number of 1, and so on.

IMPORTANT: If you want to specify an OCF resource agent service with a start-until command, you need to reference the resource agent by its index number, rather than by name.

Output from the start-until command will show output similar to the following:

systemctl start drbd-promote@linstor-db.service systemctl start systemctl start var-lib-linstor-intentional-mistake.mount Failed to start var-lib-linstor-intentional-mistake.mount: Unit var-lib-linstor-intentional-mistake.mount not found. Error: Return code not status success

Here, the command output shows an error message after a problematic service that failed to start.

If you do not get any error messages after running a start-until command, then you have confirmed that services up to and including the service that you specified are not problematic. You can next focus your attention on later services in the start list.

Manually Starting Promoter Plugin Resource Services

Another way to troubleshoot a problematic promoter plugin resource’s services is by using the systemd control tool, systemctl, to try to start individual services in the resource’s start list of services to determine where there might be a problematic service. It might make sense to start at the first service on the list. A potential disadvantage to this approach is that you need to be aware of the implicit promoter service and also start that. For this reason, you might prefer to use the drbd-reactorctl start-until method.

Listing Service Dependencies

If after trying to start a systemd service, there is a message about a dependency failure for the service, you can enter a systemctl cat command to list the service’s dependencies, as shown by Requires= entries in the unit file. For example:

# systemctl cat drbd-promote@nfs_share.service [...] [...]

Rather than showing the systemd service unit file by using a systemctl cat command, you can use the systemctl list-dependencies command to recursively show a service’s dependencies. Because of the recursive nature of the command, you might get a list of many service dependencies. Try to focus on service dependencies that are shown as inactive, as indicated by an open circle next to the service name.

Investigating Service Dependencies

After listing the service’s dependencies, you can investigate whether service dependencies are started and are active.

For example, the in the example above indicates that the nfs_share promoter plugin service depends on a DRBD resource of the same name, nfs_share. You might next verify whether or not such a DRBD resource exists.

Showing Log Messages For a Service

If you encounter a service that does not start, you can use a journalctl -xeu <name-of-service-that-does-not-start> to show log messages related to the service and troubleshoot further based on message content.

Restarting the Promoter Resource

After troubleshooting your DRBD Reactor promoter plugin resource’s start list of services, and fixing any issues, you can enable and then restart the promoter resource on all of your nodes.

IMPORTANT: If during the course of your troubleshooting, you modified any services or the promoter plugin resource’s configuration snippet file, match these changes on all of your DRBD Reactor nodes, before enabling and restarting the resource.

You can enable and restart the promoter plugin resource in one command (on each node) by entering the following command:

# drbd-reactorctl restart --with-targets <promoter-plugin-resource-name>

Written by: MAT - 2024-03-20

Reviewed by: RCK - 2024-03-26