Collecting Debugging Data When DRBD Hangs
DRBD® occasionally has bugs which cause it to become unresponsive.
For instance, a drbdsetup
command hangs.
That is, it runs for an unexpectedly long time without showing any sign of making progress.
When this happens, it is necessary to collect special debugging data to help the DRBD developers understand and fix the bug.
First set the name of the resource which is hanging:
res=<resource_name>
Then collect debugging data as follows. These commands can be copied directly into a shell and executed, or adapted to a shell script that you can run when needed.
# Trigger printing of blocked tasks to kernel log
echo w > /proc/sysrq-trigger
# Prepare environment
time=$(date +%s)
dir="$HOSTNAME.$res.$time"
prefix="$dir/$dir"
mkdir "$dir"
# Collect debug data
ps aux | grep drbd > "$prefix.ps"
for pid_file in $(find /sys/kernel/debug/drbd/resources/$res/ -name "*_pid"); do pid=$(cat $pid_file); echo "$pid_file: $pid"; [ -n "$pid" ] && cat /proc/$pid/stack; done > "$prefix.stacks"
grep -r ^ /sys/kernel/debug/drbd/resources/$res/ > "$prefix.debugfs"
journalctl -k --since -1h > "$prefix.journal_recent"
# Can be restricted to time since DRBD last in healthy state:
journalctl -k --grep "drbd $res" > "$prefix.journal_drbd"
# Create archive
tar zfc "$dir.tgz" "$dir/"
echo "Collected debug data: $dir.tgz"
These commands create a tgz
file.
Send this to the DRBD developers.
If you are a LINBIT® customer, you can open a support ticket and attach the file.
Written by JRC, 2025-10-23.
Reviewed by MAT, 2025-10-23.